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ABSTRACT 


Interprocess  Communication  <IPC>  has  been  recognized  as 
a  critical  Issue  In  the  design  and  Implementation  of  all 
modern  operating  systems*  IPC  policies  and  mechanisms  are 
even  more  central  In  the  design  of  highly  distributed 
processing  systems  --  systems  exhibiting  short-term  dynamic 
changes  In  the  availability  of  physical  and  logical  resour¬ 
ces  as  well  as  Interconnection  topology*  A  workshop  on  this 
subject  was  held  at  the  Georgia  Institute  of  Technology  In 
November  1979*  Four  working  groups*  1)  Addressing*  Naming* 
and  Security*  2)  Interprocess  Synchronization*  3) 
Interprocess  Mechanisms*  and  4)  Theory  and  Formalism*  ad¬ 
dressed  the  current  state  of  the  art  In  these  areas  as  well 
as  problems  and  future  research  directions*  This  report 
Incorporates  much  of  the  material  and  working  papers  from 
those  fields  as  well  as  selected  references  useful  In  under¬ 
standing  the  topic. 


Georgia  Institute  of  Technology 


IPC  workshop 


1 

Page  vi 

PREFACE 


The  workshop  organizing  committee  had  originally  Intended  to 
utilize  the  material  developed  by  the  Individual  working 
groups  to  prepare  a  summary  report  of  the  proceedings*  This 
concept  was  abandoned  when  It  was  recognized  that  a  "summary 
report"  would  not  adequately  report  on  and  document  all  of 
the  work  and  topics  that  were  covered  during  the  meeting. 
It  was  obvious  that  documentation  much  more  thorough  than 
merely  a  summary  report  was  warranted*  so  the  members  of  the 
organizing  committee  decided  to  directly  utilize  as  much  as 
possible  of  the  material  and  notes  prepared  by  the  working 
groups  and  assemble  and  edit  that  material  Into  an  organized 
workshop  report.  It  was  felt  that  this  approach  would  much 
better  capture  the  true  flavor  of  the  workshop  and  the 
breadth  of  the  material  covered  there. 
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SECTION  1 
INTRODUCTION 


1*1  gfidssmss  flE  ItiE  Mfl&EStiflE 


The  subject  of  the  workshop  was  Interprocess  Communication 
Mechanisms  with  a  particular  focus  on  process-to-process 
communications  in  highly  distributed  systems.  Hiqhly 
distributed  systems  are  characterized  by  very  loose  coupling 
between  physical  resources  as  well  as  between  logical 
resources.  Such  systems  also  exhibit  dynamic*  short-term 
changes  in  the  topology  and  organization  of  the  total 
system.  These  characteristics  place  new  requirements  on  the 
design  and  performance  of  IPC  mechanisms*  these  requirements 
are  assuming  extreme  Importance  in  advancing  the  state-of- 
the-art  in  all  forms  of  distributed  systems. 


1-2  WORKSHOP  ORIGINS 


The  last  meeting  that  focused  on  Interprocess  communication 
was  the  "ACM  SIGCOM/SIGOPS  Interprocess  Communications  Work¬ 
shop"  held  24-25  March*  1975.  [IPC  753 

One  might  conclude  from  the  paucity  of  material  published  on 
this  topic  since  that  workshop  that  the  problem  Is  totally 
under  control.  (The  B8N  "Network  Operating  Systems"  study 
[THOM  783  cites  only  one  reference  since  1974.)  Such  is 
definitely  not  the  case.  Work  on  IPC»s  has  been  covered 
within  projects  on  operating  systems?  however*  many  im¬ 
plementation  and  performance  problems  are  only  partially 
solved  or  solved  only  on  an  ad  hoc  basis*  and  it  appeared 
that  the  time  was  ripe  to  again  focus  a  meeting  of 
specialists  onto  this  topic*  especially  in  view  of  its  key 
role  in  the  operation  and  performance  of  distributed 
systems. 

Since  1975  advances  in  the  field  of  computer  communications 
have  provided  mechanisms  for  connectina  computers  together 
In  a  variety  of  configurations.  For  Instance*  high  speed 
serial  communication  paths  [METC  76*  GORD  793  have  permitted 
effective  local  networks  ECLAR  783*  in  which  many  computers 
share  soeclalized  resources  (storage*  printing  facilities* 
etc.)*  while  each  node  still  retains  some  degree  of 
autonomy.  In  addition*  many  mini-computers  support  large 
address  spaces*  and  a  corresponding  high  degree  of  mul- 
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tlprogrammlng.  One  natural  way  to  construct  the  software 
for  such  systems  Is  to  base  the  software  architecture  on  the 
notion  that  most  tasks  will  be  performed  by  a  collection  of 
communicating  asynchronous  processes*  running  on  the  same  or 
different  processors.  Such  systems  are  known  as  "highly 
distributed  systems"*  and  are  characterized  by  a  very  loose 
coupling  between  physical  resources  as  well  as  between 
logical  resources*  and  they  allow  dynamic*  short-term 
changes  In  the  topology  and  organization  of  the  total 
system. 

The  fact  that  these  systems  are  very  loosely  coupled*  both 
physically  and  logically*  places  quite  different  demands  on 
IPC  from  those  applicable  to  more  tightly  coupled  contem¬ 
porary  systems*  even  those  Incorporating  a  local  network  as 
the  Interconnection  mechanism.  Practical  attempts  to 
construct  such  systems  Immediately  direct  ones  attention  to 
available  Interprocess  Communication  (IPC)  mechanisms  and 
their  shortcomings.  Lack  of  well  constructed  and  well  un¬ 
derstood  mechanisms  Is  the  root,  of,  most,  of,  t.Jie  difficulties 

In  kuj.idiQa  dj.£i£ibyi£d 


l.s  PURPOSE  AM  SEfiPE  IHE  IflMSUOf 

The  "Workshop  on  Interprocess  Communications  In  Highly 
Distributed  Systems"  was  Intended  to  bring  together  a  selec¬ 
ted  group  of  workers  In  the  subject  area  to  address  the  five 
general  goals  listed  below: 

1)  Assess  the  present  state-of-the-art  for  IPC 
mechanisms  In  distributed  data  processing 
systems 

2)  Identify  the  data  available  on  the  actual 
Derformance  of  various  IPC  policies  and 
mechanl  sms • 

3)  Assess  the  potential  value  of  various  IPC 

mechanisms  satisfying  the  operational  and 
performance  reoulrements  for  highly 

distributed  systems. 

4)  Identify  shortcomings  In  the  present  state- 
of-the-art  and  Identify  promising  areas  for 
future  research  and  experiment  on  this  sub¬ 
ject. 

5)  Identify  possible  standardization  levels  of 
IPC. 
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The  scope  of  the  workshop  will  be  limited  to  I  PC  mechanisms 
for  use  In  distributed  systems.  (This  acknowledges  fairly 
common  agreement  among  the  research  community  that  the  fol¬ 
lowing  are  not  DDP*s  ---  multiprocessors*  computer  networks 
per  se*  Intelligent  terminal  systems*  and  satellite  proces¬ 
sor  systems.) 


1.4  mamas  as  mt  McaasHflE 


Workshop  attendees  were  selected  from  Individuals  actively 
working  In  the  field*  and  the  size  of  the  workshop  was 
purposely  limited  to  approximately  40  attendees.  Special 
attention  was  given  to  obtain  participants  who  met  one  or 
more  of  the  following  criteria: 

-  Had  had  practical  experience  In  the  design  and 
Implementation  of  IPC  policies  and  mechanisms  In 
highly  distributed  systems. 

-  Had  analyzed  and/or  measured  the  actual  per¬ 
formance  of  various  IPC  mechanisms. 

-  Would  contribute  a  written  submission  to  the 
workshop. 


The  workshop  was  held  from  12:00  noon*  20- November*  thru 
12: 00  noon*  22-November»  1978*  at  the  Atlanta  Townehouse 
Motor  Hotel*  Immediately  adjacent  to  the  Georgia  Tech  cam¬ 
pus. 

Before  the  workshop*  Invitees  were  requested  to  Identify 
their  areas  of  Interest.  Based  on  that  Input*  the  organiz¬ 
ing  committee  established  six  working  groups: 

1)  Addressing  and  Security 

2)  Fault  Tolerance 

3)  Synchronization*  Signalling*  and  Flow  Control 

4)  Theory  and  Formalism 

5)  Hardware  and  Primitives 

6)  Programming  Issues 

However*  as  often  (usually?)  happens  In  such  situations* 
when  the  groups  met  and  discussed  their  areas  of  Interest* 
realignments  In  the  working  group  organization  resulted  In 
four  working  groups  rather  than  six. 

1)  Addressing*  Naming*  and  Security 

2)  Interprocess  Synchronization 

3)  Mechanisms 

4)  Theory  and  Formalism 
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The  output  of  these  four  groups  Is  the  basis  for  this 
report . 


1*5  tUSMLLS. 


1E£  MflB&fiBflE 

LIST  a£  ATTENDEES 

(*  Members  of  the  Organizing  Committee) 


Hat  Abelson 

Laboratory  for  Computer  Science 
Massachusetts  Institute  of  Technology 

Allen  Akin 

Georgia  Institute  of  Technology 

School  of  Information  &  Computer  Science 

Edwin  Basart 
Hewlett-Packard  Co. 

General  Systems  Division 

Morton  I.  Berstein 
System  Development  Corp. 

Bill  Buck les 

General  Research  Corp. 

James  E.  Burns 

Georgia  Institute  of  Technology 

School  of  Information  &  Computer  Science 

Gregory  Chesson  * 

Bell  Laboratories 

Uushow  Chou 

North  Carolina  State  University 
Computer  Studies 

Phillip  Crews 

Georgia  Institute  of  Technology 

School  of  Information  &  Computer  Science 

Richard  A.  DeMIllo 

Georgia  Institute  of  Technology 

School  of  Information  &  Computer  Science 
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Philip  H.  Enslow.  Jr.  * 

Georgia  Institute  of  Technology 

School  of  Information  &  Computer  Science 

Michael  Fischer 
University  of  Washington 
Department  of  Computer  Science 

Mark  Gang 

Ford  Aerospace  &  Communl cat  Ions  Corp. 

Western  Development  Laboratories 

Robert  L.  Gordon  * 

PRIME  Computers 

Jim  Hamilton 

Digital  Equipment  Corp. 

Mohommad  Hassan 
MODCOMP 

Steven  F.  Holmgren 
Digital  Technology.  Inc. 

Doug  Jensen  * 

Honeywell  Research 

(Presently  Carnegle-Mellon  University) 

Richard  Kaln 

University  of  Minnesota 

Department  of  Electrical  Engineering 

Steve  Klmbleton 

Institute  for  Computer  Science  &  Technology 
National  Bureau  of  Standards 

Peter  Koschewa 

U.S.  Army  Institute  for  Research  In  Management 
Information  and  Computer  Sciences 

Leslie  Lamport 
SRI  International 

Oavld  Lapin 
Burrouohs  Corporation 
Computer  Systems  Group 

Thomas  Lawrence 

Rome  Air  Development  Center 

U.S.  Air  Force 

Richard  Lefllanc 

Georgia  Institute  of  Technology 

School  of  Information  &  Computer  Science 
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Gerard  Le  Lann 
SIRIUS 

I R I  (France) 

Edward  Y*S*  Lee 

TRW  Defense  &  Space  Systems  Group 
Jon  Llvesey 

University  of  Waterloo 

Computer  Communications  Network  Group 

James  ft*  Low 
University  of  Rochester 
Department  of  Computer  Science 

Nancy  A.  Lynch  j 

Georgia  Institute  of  Technology 

School  of  Information  &  Computer  Science 

Edith  Martin 

Georgia  Institute  of  Technology 
Engineering  Experiment  Station 

Wayne  McCoy 

Kennedy  Space  Flight  Center 
NASA 

Nancy  Melsner 

University  of  Waterloo 

Computer  Communications  Network  Group 

Ira  Newman 

Department  of  Defense 

Richard  Peebles 
Digital  Equipment  Corp. 

Steve  ftatzel 

U.S*  Army  Institute  for  Research  In  Management 
Information  and  Computer  Sciences 

Donald  Sharp 

Georgia  Institute  of  Technology 

School  of  Information  &  Computer  Science 

David  Slncoskle 

University  of  Delaware 

Department  of  Electrical  Engineering 

Stephen  W»  Smollar 
General  Research  Corp* 

John  Staudhammer 

U.S*  Army  Research  Office 
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Carl  Sunshine 
Rand  Corporation 

(Present  location:  ISI*  University  of  Southern  California) 

Joseph  S.  Sventek 

Lawrence  Berkeley  Laboratories 

Computer  Science  &  Applied  Mathematics 

P.  S.  Thlagarajan 

Instltut  fuer  Informat lons-systemf orschung 
GMD 

Virgil  E.  Wallentlne 
Kansas  State  University 
Department  of  Computer  Science 

Don  Weir 

Telenet  Communication  Corp. 

Douglas  E •  Wrege 

Georgia  Institute  of  Technology 

Engineering  Experiment  Station 


1.6  flBfiAMlZAIUm  fl£  IAU  BEEflfil 


Following  this  Introductory  section*  there  Is  a  short  sec¬ 
tion  on  the  general  background  of  Interprocess  communication 
techniques.  The  main  body  of  this  report  Is  Sections  3*  A* 
5*  and  6  which  cover  the  results  of  each  of  the  Working 
Groups.  Within  each  section*  the  first  material  presented 
Is  a  summary  of  the  Working  Group  presentation  made  at  the 
end  of  the  workshop.  Following  that*  there  Is*  In  some 
Instances*  a  collection  of  amplifying  material  and  selec¬ 
tions  from  the  position  papers  that  were  prepared  prior  to 
the  workshop  and  distributed  to  the  attendees. 

Section  7  contains  several  longer  papers  that  were  either 
prepared  specifically  for  distribution  at  the  workshop  or 
were  felt  by  the  authors  to  be  applicable  to  the  workshop 
and  were  distributed  to  the  attendees  there.  Section  8  Is  a 
very  brief  summary  and  discussion  of  future  directions  for 
IPC  and  Section  9  contains  the  references  utilized  In  the 
report • 
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SECTION  2 
BACKGROUND 


2.1  HURaaucnati 


Probably  the  single  most  Important  hindrance  to  the  develop¬ 
ment  of  Interprocess  communication  has  been  the  lack  of 
general  acceptance  and  agreement  on  the  notion  and  abstrac¬ 
tion  of  a  "process."  Until  the  "process  model"  of  computa¬ 
tion  becomes  generally  accepted  and  used  as  the  basis  of 
software  architectures,  there  will  be  tittle  motivation  for 
Interprocess  communication  mechanisms. 

In  most  systems  the  abstraction  of  a  "process"  has  not  been 
developed  well  enough  for  It  to  be  treated  as  an  "object"  In 
Its  own  right  so  that  "processes"  can  be  used  conveniently 
by  system  architects  and  others  as  building  blocks. 
Primitives  for  the  creation,  synchronization,  addressing, 
and  communication  of  processes  have  In  the  past  only  been 
generally  available  to  operating  system  developers,  and 
therefore  not  widely  used  by  application  programmers  In  ap¬ 
plications  software  systems.  Unfortunately  operating  system 
developers  tend,  to  live  with  and  use  poorly  documented  ex¬ 
perimental  primitives  and  other  ftjt  hoc  mechanisms.  The 
notable  exceptions  to  this  rule  form  the  core  body  of  clas¬ 
sic  literature  In  this  field  CBRIN  69.  OIJK  68b.  OIJK  71, 
DALE  681.  For  the  most  part,  application  programmers  In  the 
past  have  been  restricted  to  conventional  I/O  using  shared 
files  as  a  pragmatic  method  of  Interprocess  communication, 
with  only  partial  success. 

\ 

When  the  notion  of  a  "process"  becomes  recognized  as  a  fun¬ 
damental  building  block  for  distributed  applications, 
stronger  support  and  documentation  will  have  to  be  provided 
by  the  system  suppliers  and  manufacturers.  thus  making 
available  to  application  coders  a  robust  set  of  "process- 
based"  primitives.  After  such  widespread  support 
materializes.  the  design  experience  and  performance 
statistics  will  provide  the  basis  for  a  fuller  understanding 
of  all  aspects  of  Interprocess  communication. 

A  comprehensive  survey  of  the  present  state-of-the-art  In 
Interprocess  communication  Is  presented  In  paragraph  7.6. 
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2.2  E&fitfSS  am  fi£  CflBEUlAIIflM 


An  excellent  survey  of  the  "process  model  of  computation" 
can  be  found  In  CHORN  733.  Prior  to  this,  articles  on 
operating  systems  developed  the  notion  of  a  "process"  or 
"task."  as  an  entity  that  could  be  scheduled  and  own  other 
resources  In  mult Iprogrammed  systems,  but  they  did  not  treat 
a  process  as  a  structuring  methodology  In  Its  own  right. 
Examples  of  these  notions  can  be  found  In  C S ALT  66]  and  [IBM 
71]. 

Access  to  resources  In  early  operating  systems  presented  the 
very  first  examples  of  Interprocess  communication,  but  these 
early  IPC  techniques  varied  widely  from  one  Implementation 
to  the  next.  For  example.  In  most  systems,  the  line  printer 
daemon  (or  process)  owned  the  line  printer,  and  access  to 
the  printer  was  restricted  to  ordinary  "write"  statements  at 
the  language  level  coupled  with  "logical  unit"  assignment  at 
the  job  control  of  command  language  level.  Other  examples 
may  be  found  where  the  login  process  "owns"  the  communica¬ 
tion  lines,  or  a  file  manager  owns  the  file  system  as  In  the 
MERT  operating  system  CLYCK  78].  An  early  message-based 
operating  system  structured  around  processes  Is  the  RC4000 
operating  system  CBRIN  69.  BRIN  70]. 

Trends  in  software  engineering,  appl Icat Ions,  and  technology 
certainly  point  to  an  Increasing  awareness  of  a  process  as  a 
fundamental  method  of  structuring  systems.  The  prolifera¬ 
tion  of  Inexpensive  processors  and  low  cost  bandwidth  sug¬ 
gest  a  process  model  of  computation*  even  If  there  Is  only 
one  process  per  processing  element,  since  control  and  shar¬ 
ing  of  common  resources  must  be  by  some  form  of  Interprocess 
communication.  New  architectures  are  now  being  proposed 
that  exploit  these  trends,  e.g.  CNELS  78].  The  CNELS  78] 
proposal  Is  based  on  a  high-speed  packet-oriented  bus  Inter¬ 
connecting  a  large  number  of  processor-memory  pairs.  termed 
"cells."  Each  cell  Includes  a  CPU.  a  primary  memory  system 
(typically  one  or  two  megabytes),  a  packet  bus  node  control¬ 
ler.  and  possibly  some  peripherals  such  as  disks  or  com¬ 
munications  devices.  The  architecture  supports  applications 
decomposed  at  the  process  level?  the  entire  system  Is  viewed 
as  a  set  of  cooperating  processes,  distributed  among  the 
cells  to  Improve  performance,  cost*  or  availability. 


2.3  maun  mimum  mim 


Highly  distributed  systems  are  character  1  zed  by  very  loose 
coupling  between  physical  as  well  as  logical  resources.  In 
addition  they  exhibit  dynamic,  short-term  changes  In  the 
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topology  and  organization  of  the  total  system*  The  fact 
that  these  systems  are  very  loosely  coupled*  both  physically 
and  logically*  places  quite  different  demands  on  IPC  from 
those  applicable  to  more  tightly  coupled  contemporary 
systems*  even  those  Incorporating  a  "network"  as  the  Inter¬ 
connection  mechanism* 

Such  systems  should  support  multiple  name  spaces*  Including 
the  management  and  translation  of  file  and  unit  names  In 
these  name  spaces*  In  addition*  such  systems  should  handle 
abstractions  built  from  collections  of  communicating  proces¬ 
ses  and  provide  mechanisms  for  addressing  and  synchronizing 
groups  of  processes*  High  bandwidth  message  transport 
mechanisms  will  potentially  allow  multiple  logical  connec¬ 
tions  between  processes  to  be  constructed  whenever  con¬ 
venient*  but  system  support  must  be  available  for  those  con¬ 
nections  to  be  useful*  To  date*  very  little  experience  Is 
available  to  assist  a  designer  attempting  to  construct  com¬ 
plex  systems  out  of  communicating  processes* 


2.«  1££  SIBUSIUEES 


Most  existing  IPC  primitives  and  structures  are  based  on  a 
"two-party"  communication  model*  In  which  there  Is  a  single 
"sender"  and  a  single  "receiver"  for  each  transaction  or 
message*  (This  Is  certainly  the  basis  for  IPC  facilities 
built  around  the  X.25  level  3  protocol  [CCIT  78]*)  Other 
kinds  of  communication  facilities  may  better  support  ring* 
tree  and  general  graph  models  of  process  networks* 
Protocols  Involving  more  than  two  processes  are  called  "N- 
process"  protocols  CPARD  7931  they  should  find  use  In  shared 
data  base  and  electronic  mall  systems. 

The  major  functions  supporting  these  protocols  are  storing* 
forwarding  and  routing  variable  length  messages*  These 
functions  can  be  difficult  to  Implement  If  communication 
links*  processing  nodes*  or  other  resources  are  only 
partially  available* 


2.3  uumsflfisas  coMi&flL  si&u&ima 


Communication  links  between  processes  can  be  allocated 
strictly  to  control  functions.  In  fact*  the  degree  of 
separation  of  control  and  data  Is  an  Important  research  1s- 
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sue*  A  path  primarily  used  for  the  transport  of  data  may 
have  no  mechanism  for  control  or  "out  of  band"  signalling* 
which  may  make  error  detection  and  recovery  difficult*  if 
not  Impossible*  The  system’s  control  path  structure  is 
primarily  determined  by  the  "control  model"  used  during 
system  development.  The  "classical"  system  organizations 
are  a)  master/slave*  b)  hierarchical*  c>  democratic*  or  d> 
autonomous*  The  first  two  are  well  understood  and  readily 
implemented*  while  the  latter  control  organizations  are  not 
well  understood  (in  an  algorithmic  sense)  and  are  the  sub¬ 
ject  of  much  research  CHOAR  783. 


# 
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SECTION  3 

ADDRESSING*  NAMING*  and  SECURITY 

3.1  H  £M£  SUttflASI  fLimi 

What  are  objects 

files*  processes*  devices 
Unlfor*  mechanism? 

File  metaphor  --  UNIX 
Process  metaphor  --  MININET.  RCA500 
Abstractions  --  WEB 
Worldview:  (a  la  OISY) 

Universe  >>>  Systems  >>>  Objects 
Distinguish  between: 

NAMES  —  what 
ADORESSES  --  where 
ROUTES  --  how  to  reach 
Basic  Problem:  map 

NAMES  >>>  ADDRESSES 
Desirable  features: 

Generic  naming 
Context  Independence 
Location  Independence 
Broadcast  (group  name) 

Uni auene  s  s 
Path  addressing 
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Other  concerns: 

Flat  vs>  hlerarthlcal 
Centralized  vs.  distributed 
Steps 

Search  rules 
Conne  c  1 1  on  s 
Transactions 
Merging  two  systems: 

1.  one  below  other 

2.  both  below  new  prefix 

3.  corresponding  unused  addresses 

Name  >>>  Address  mapping  may  be  separate  from  IPC. 
IPC  between  specific  addresses 
Directory  object  with  well-known  address 
DISY  "MAILBOX" 

Generic  naming 
Location  Independent 
Uniqueness 
Object  pointer 
Resource  limits 
Access  controls 

S&ULLll* 

Main  attributes  of  subject: 

Logical  Identity 
Physical  location 
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Problems: 


1.  authentication  ♦  access 

control  of  location 
2«  storing  authorization  on  areas 
outside  security  environment 
3o  moving  objects  If  encryption 
based  on  location 
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3*2  AMPLIFYING  MATERIAL 


What  are  objects?  files*  devices*  processes 

-  What  things  should  be  In  a  list  of  primitive  ob¬ 
jects? 

-  Should  we  choose  one  object  type  to  represent 
all  objects? 


Should  there  be  a  uniform  mechanism  for  all  objects? 

-  file  "metaphor"  -  Unix  CTHOM  74] 

-  process  "metaphor"  -  Mlnlnet  [PEEB  78]»  RC  4000 
( performance? ) 

-  abstractions 

-  WEB  at  DEC  (performance?) 

-  Capability  based  systems 


Uniform  mechanism  is  a  good  thing.  Being  able  to  do  this 
requires  picking  one  of  the  above.  Not  sure  we  can. 

Worldview:  ANS I /SPARC/DI SY  CDESJ  78]  or  ISO  SC  16  model 

-  Universe  consists  of  multiple  systems. 

-  Systems  have  many  objects. 


Distinguish  Between  (what)*  hSH.LS.HS.1  (where).  Rfiili.£S 

(how  to  reach).  (see  CSHOC  78]) 

Basic  Problem:  mapping  NAMES  to  ADDRESSES. 

Desirable  features  of  this  mapping: 

1)  generic  naming  -  many  potential  servers 

-  within  one  system  or  across 
systems 

-  selected  by  server  or  by 
requestor  ("request  for  service" 
facility  is  just  latter  [FAR3 
73  ]) 

2)  location  Independence  -  same  name  may  be  used 
no  matter  where  server  Is  located 

3  >  broadcast  -  (group  name)  -  communication  with 
multiple  servers 

4)  uniqueness  -  only  one  name  for  given  object 
or  set  of  objects  at  some  level 

5)  path  addressing  or  source  routing  -  source 
specifies  sequence  of  addresses  to  reach  ob- 
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ject.  Useful  If  "system"  does  not  know 
route*  or  If  destination  Is  outside  normal 
name  soace. 


Additional  mapping  concepts: 

1)  Flat  vs.  hierarchical  -  latter  allows  each 
directory  or  switch  to  know  only  about 
elements  at  its  own  level  -->  many  smaller 
directories  vs.  one  large  one. 

2)  Centralized  vs.  distributed  -  centralized 
can  be  rellabte*  but  requires  roundtrlp  delay 
to  get  Information*  high  load  at  center. 
Distributed  may  allow  local  lookup*  or  may 
require  broadcast.  Update  more  complex. 

3)  There  may  be  many  directories*  and  many 
"steps"  in  the  address  lookup.  Example:  "my 
name"  to  global  name*  global  name  to  system 
addr es s/ lo c a l  name*  (send  to  remote  system)* 
local  name  to  local  address. 

4)  Search  rules  -  each  user  may  have  rules  for 
tailoring  lookup  to  his  needs. 


NAME  ADDRESS  mapping  may  be  costly.  Hence  desire  to  do 

It  on£e  for  many  successive  messages  to  same  destination. 
Leads  to  £flnn££t.i.2n  notion.  May  Include  route  setup. 
Cachelnq  of  recently  used  names/addresses  also  helpful. 
Connection  also  needed  when  desired  that  successive  messages 
to  a  given  name  go  to  the  same  object*  In  order.  If 
transactions  are  Independent*  then  a  different  Instance  of 
the  named  object  can  serve  each  -  no  connection  needed. 
[NSW  7  o ] 

Problem  of  merging  two  previously  Independent  systems: 

1)  May  add  "prefix"  to  all  addresses  (a  higher 
level  In  hierarchy)  to  distinguish  systems. 

2)  M  a  k  e  one  system  "below"  other  In  hierarchy. 

3)  '-lake  unused  addresses  In  each  system 

correspond  to  addresses  in  other  system. 

Only  good  for  small  numbers. 


NAME  -->  ADDRESS  translation  may  be  separate  from  basic  IFC 
which  is  between  specific  addre^ej^  £n_l^.  Then  directory 
object  (process)  with  well-known  address  can  be  accessed  to 
provide  translation*  with  result  returned  via  basic  IPC. 
Then  requestor  does  basic  IPC  with  specific  address  of  ser¬ 
vice  actually  desired.  Examples:  ARPANET  Initial  Connec¬ 
tion  Protocol*  Mlnlnet  CPEEB  783. 

Important  Example:  Our  view  of  DJ.SY  "  ma  i,  Uaox, "  CDESJ  78D  has 
properties  or  components: 
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-  generic  name 

-  location  Independent 

-  uniqueness 

-  pointer  to  object  (process)  mailbox  stands  for 

-  resource  control  (how  many  In  use) 

-  access  controls*  owner 


Securl ty : 

1)  Does  pot  Include  reliability*  failure 

recovery. 

2)  Does  Include  authentication*  access  controls* 
encryption*  correctness. 

3)  Basic  goal  -  allow  objects  to  be  accessed 
only  by  specified  subject. 

4)  Two  main  attributes  of  subject: 

-  logical  Identity 

-  physical  location 

5)  Problems: 

a)  Allow  onject  to  be  accessed 
from  one  place  but  not  another 
(e.g.»  not  via  dial-in).  Must 
authenticate  location  as  well 
as  Identity. 

b)  Removable  media  plus  unsecured 

sources:  Can  au t hor 1 za 1 1  on 

Information  be  stored  In  areas 
outside  of  physical  control? 

c)  Encryption  problem.  If 

authorizations  are  encrypted 
based  on  location  of  object* 
how  can  object  move?  (Two 
constraints:  need  to  give 

authorizations  to  others*  but 
must  not  be  forgeable  (hence 
encrypt  Ion) ) . 
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3.3  CASE  STUDIES 


3.3.1  £lll£ljUll£4  flAll  Aim 


by 

Edward  Lee 
TRW 

Most  DDB  protocols  seem  to  assume  that  Data  Base  Managers 
can  figure  out  how  to  communicate  between  themselves  and 
that  naming  one  another  Is  not  a  problem.  Is  It  reasonable 
to  assume  that  file  system  operations  and  process  IpC  are 
basically  the  same  mechanism?  DISY  has  process  as  the  basic 
communicating  object.  You  basically  open  a  channel  to  a 
process  and  then  communicate  directly  with  It.  It  is  the 
Session  Controller  (DISY)  which  opens  the  channel  tor  you. 


3.3.2  fiininil 

by 

J.  Llvesey 

University  of  Waterloo 

Mlnlnet  Is  a  system  In  which  addressing  Is  basically 
separate  from  IPC.  In  many  systems  some  form  of  addressing 
method  (name  -->  address  translation)  Is  Implicit  In  IPC. 

In  Mlnlnet*  IpC  consists  s o l,£ l,  £  of  the  transmission  of  a 
message  from  a  $ Task  to  a  R£££iy££  T ££jc  which  has  to 
be  Identified  by  an  Integer  _T ajjk  iden t,1__f _1  e r;  (an  address 
rather  than  a  name).  In  the  distributed  case  the  host  Id  Is 
concatenated  with  the  task  Identifier  within  the  host. 

The  question  then  Is  how  to  get  the  task  Identifier  for  a 
task  to  perform  a  particular  function. 

In  fact*  £1.1  system  resources  (tasks*  files*  devices*  direc¬ 
tories*  ...)  are  formalized  as  tasks.  A  task  has  code  and 
data  segments.  A  file*  for  Instance*  Is  a  task  whose  code 
segments  are  the  £££££.§.  Method  and  whose  data  seqments  are 
code  seqments.  A  file  task  gets  messages  of  the  form: 

read  (record  tt ) 

and  reacts  by  returning  a  message  to  the  user  containing  the 
record  data. 
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There  Is  only  one  wel l-known  task  In  each  host*  the 
0J.£££i££X  T ? S k  which  has  the  responsibility  to  maintain  a 
list  relating  f  yn&t Ion  name  (a  character  string)  to  task 
l££fliili££  for  each  task  In  this  host.  As  the  ultimate 
parent  of  each  task  he  can  find  out  their  task  Ids*  (Task 
Identifier  of  a  new  task  Is  returned  to  the  creating  task* 
the  parent •  )  Now*  when  user  task  A*  for  Instance  wants  to 
perform 


open  (filename) 

It  does  so  by  asking  the  directory  task  for  the  Identifier 
of  the  "file-open"  task.  Assuming  this  exists  locally*  the 
directory  task  returns  Its  task  Id.  The  user  now  com¬ 
municates  directly  with  "file-open"  (a  la  DISY  session)  and 
sends  It  a  message 

"open  ( f 1 lename >  " 


The  task  "file-open"  now  c.rea^ejj  a  file  _t  a^Jc  whose  data  seg¬ 
ments  are  the  data  records  of  "filename"  and  returns  the 
"file"  task  Identlfer  to  the  user  task. 

The  user  task  now  communicates  with  the  "file"  task  (a 
second  host  session  a  la  OISY)  with  messages 

"read  (record  8  ) " 

"write  (record  8  > " 

"close  ( )  " 


The  "file-open"  task  handles  mutual  exclusion  on  the  file 
(by  refusing  to  create  new  file  tasks  for  the  same  file  as 
long  as  someone  has  It  open  to  write).  The  "file"  task  han¬ 
dles  record  mutual  exclusion. 

In  the  case  where  no  task  exists  In  the  local  hosts  to  hand¬ 
le  function  "X"  the  local  directory  task  talks  to  remote 
directory  tasks*  who  are  responsible  for  knowing  which  tasks 
exist  In  their  hosts  (and  which  can  be  created  to  do  "X"). 
Directory  tasks  announce  themselves  to  one  another  at  boot 
time. 

E£i££££££l : 


CPEEB  78  ] 
CLIVE  78a] 
CLIVE  79b] 
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3*3.3  Discussion 
Melsner : 

Is  this  more  complicated  than  a  straight  function 
CALL/RETURN  system? 

Llvesey: 

Yes*  but  more  flexible  since  you  can  impose  a  function 
CALL/RETURN  system  on  top  of  the  basic  task/message¬ 
passing  system  using  library  routines  If  you  want.  It 
Is  also  assumed  that  we  have  a  homogeneous  system. 

Suns  h 1 ne : 

Clearly  we  can  have  server  processes  to  guard  and  ad¬ 
minister 

directories 
open  function 
file  tasks 
etc . 


Lapin: 

We  need  hardware  to  support  process  Invocation/context 
switch  better  than  at  present. 

Llvesey : 

Yes*  but  future  hardware  should  not  lock  us  Into  func¬ 
tion  call/process  Invocation  capabilities*  etc. 

Sunshine: 

Curiously*  In  Mlnlnet*  every  resource  (object)  Is  a 
task  (process)*  but  the  creation  of  a  process  Involves 
reading  a  file  (an  object  containing  Its  code  seg¬ 
ments  )  . 

Ens low: 

Lee  says  that  his  distributed  data  base  should  be 
redundant.  Does  the  system  Itself  select  the  optimal 
record! 

Lapin: 

Redundancy  increases  the  reliability  of  the  system. 
Llvesey: 

We  have  both  homogeneous  and  heterogeneous  redundancy 
here. 

Homogeneous 

-  Identical  copies  of  data 

-  Increases  reliability 

He  terogeneous 

-  copies  of  non-identical  objects  to  perform 
similar  functions*  eg.  FORTRAN  compilers 
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-  Increases  system  band  width 


McCoy : 

Can  we  get  a  system  to  give  us  both! 

Sunshine: 

To  do  It  across  several  systems  has  a  cost  and  we  have 
to  ask  If  the  utility  of  redundancy  Is  worth  the  cost. 
The  ARPANET  Resource  Sharing  Executive  (RSEXEC)  was  a 
stripped-down  operatlna  system  for  remotely  logged- 
inusers  who  actually  executed  on  the  first  available 
DEC  10  but  never  knew  which  one.  This  was  also  an  at¬ 
tempt  to  provide  a  network-wide  file  system.  Multiple 
server  systems  such  as  the  Irvine  Net  recognize  the 
need  to  go  accross  the  system  to  get  resources.  To  use 
this  we  may  need  utility  programs  to  perform 

local  COBOL  -->  ANSI  COBOL 

and  maybe  even 

ANSI  COBOL  -->  Local  COBOL 


Li vesey J 

May  also  have  a  network  JCL  so  that  a  user  only  uses 
the  JCL  of  his  local  machine*  and  then  we  need  to  be 
able  to  do  the  translation 

Local  JCL  #1  -->  Network  JCL  — >  Local  JCL  «? 


Lapin: 

There  are  two  approaches  to  a  multi  UNIX  system  file' 
system.  We  can  have 

/net 

as  a  special  file  and  address  files  on  machines  a*  B* 
etc.  as 


/net/A/pathname  ... 

/net/B/Dathname  ... 

We  can  also  localize  host  Id  In  the  pathname  explicitly 
partl/part2 

parti:  host  Id  part2:  pathname 


Sunshine: 

There  Is  a  conflict  between  REAL  and  IDEAL  worlds.  In 
the  Real  World*  we  tend  to  Involve  the  user  In  specify¬ 
ing  the  location  of  a  function  (service).  In  the  Ideal 
World*  we  would  H&a  to  give  the  user  flfcSiiatilflQ* 
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generic  naming  and  location  Independent  naming. 

Li vesey ! 

Part  of  the  problem  Is  that  the  concept  of  the  size  of 
the  universe  (of  which  the  system  forms  a  part)  Is  Im¬ 
plicit  In  the  system  at  a  high  cost.  One  Is  then  for¬ 
ced  to  choose  between  add-on  features  such  as: 

/net/A/resource 

which  are  not  location  Independent  on  the  one  hand,  and 
a  more  or  less  complete  rewrite  on  the  other  hand. 
UNIX  Is  an  example  of  such  a  system  that  makes  assump¬ 
tions  about  the  size  of  the  universe. 

Melsner: 

We  now  have  choices  between 

i)  Centralized  Directories 

which  can  now  be  made  ver^  reliable 
11)  Distributed  Knowledge 
ill)  Tree  Structures 


LI vesey: 

(111)  Is  Just  a  disguised  directory  method.  There  are 
really  two  choicest  centralized  and  distributed. 

Hassan  t 

Efficiency  may  dictate  tree  structures  rather  than 
directory  tasks.  This  was  a  factor  In  the  MULTICS 
desl gn. 
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3*4  Emnm  ea£I££ 


3.4.1  Hanlltao 


Addressing  and  Security 
by 

Jim  Hamilton 

Olgltal  Equipment  Corporation 

Because  of  ever  Increasing  complexity  of  software  develop¬ 
ment  and  maintenance*  providing  any  programming  environment 
which  complicates  software  development  would  be  a  mistake. 
This  argument  leads  to  a  view  of  distributedness  as  a 
property  of  the  Implementation  of  a  system*  and  not  of  the 
application  development  environment. 

Addressing  and  protection  are  critically  imoortant  In  ap¬ 
plication  development.  The  above  view  of  distributedness 
Implies  that  addressing  must  be  location  Independent.  That 
Is*  local  and  remote  objects  must  be  addressed  Identically. 
Furthermore*  I  believe  that  addresses  should  also  be  in¬ 
dependent  of  the  context  of  reference  (different  processes 
should  address  the  same  object  In  the  same  way)*  and  uniform 
across  all  object  types  (hardware  defined  objects*  system 
defined  objects*  and  application  defined  objects  should  all 
be  addressed  similarly). 

I  also  believe  that  the  use  of  processes  to  abstract  all 
other  objects  Is  a  mistake*  for  several  reasons!  1)  It 
restricts  the  flexibility  of  the  environment  for  the  execu¬ 
tion  of  functions*  2)  It  often  forces  the  Invention  of  ad¬ 
ditional  addressing  mechanisms  within  the  application*  3)  It 
Is  Inadequate  to  address  system  and  hardware  defined  objects 
(e.g.*  devices)*  4)  It  Inevitably  colors  the  application 
designer’s  conceptualization  of  the  system*  and  finally*  D> 
It  does  not  appear  to  be  necessary. 

To  achieve  a  distributed  Implementation*  It  will  still  be 
necessary  to  solve  the  problems  of  physical  communication 
and  Its  associated  addressing  problems  at  a  lower  level. 
But  the  problems  are  considerably  simplified  since  the 
mechanisms  can  now  be  hiqhly  specialized*  because  they  are 
not  visible  to  the  application  designer. 

I  believe  that  the  notion  of  capability  based  addressing* 
when  properly  Interpreted  and  Implemented*  provides  all  of 
the  properties  mentioned  above.  Moreover*  It  con  be 
naturally  extended  to  provide  capability  based  protection* 
which  Is  further  discussed  below.  The  challenge  Is  to 
achieve  an  Implementation  which  Is  cost-effective*  and  which 
still  has  all  of  the  necessary  properties.  A  failure  In 
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either  domain  will  be  fatal*  An  even  greater  challenge  Is 
to  convince  the  computer  Industry  that  the  Inevitably  higher 
cost  of  the  oaslc  system  will  be  more  than  offset  by  the 
reduced  cost  of  software* 

I  believe  that  the  Issue  of  sharing  Is  partially  separable 
from  that  of  addresslnq.  Context  Independent  addresslno  Is 
a  prerequisite  for  sharlnq*  but  Its  existence  does  not  Imply 
concurrent  access  by  separate  processes*  Concurrent  access 
to  Immutable  objects  should  be  possible*  for  performance 
reasons*  out  concurrent  access  to  mutable  objects  now  ap¬ 
pears  to  be  a  dangerous  mistake*  By  precluding  this  kind  of 
sharing*  we  also  simplify  the  construction  of  distributed 
1 mp l emen t  a  t i ons  . 

Given  an  addressing  mechanism  with  the  properties  mentioned 
above*  a  variety  of  protection  mechanisms  can  be  Im¬ 
plemented.  Capability  based  protection  still  seems  to  be 
the  most  promising  of  these*  although  it  has  been  crltlclzea 
as  Inappropriate  for  distributed  Implementations*  I  tend  to 
reject  this  criticism*  but  the  notion  of  se l f -a u t hen t i c a t 1 na 
capabilities  has  been  developed  at  ?erkeley  to  address  this 
problem. 

The  notion  of  system  security  has  many  different  aspects* 
Included  among  these  are  physical  security*  correctness  of 
Implementation*  and  the  logical  access  control  model  being 
Implemented.  In  comparison  with  centralized  Im¬ 
plementations*  distributed  ones  seem  notably  weaker  In 
physical  security*  and  possibly  weaker  in  correctness 
because  of  greater  complexity.  The  access  control  model 
should  not*  In  principal*  depend  upon  the  Implementation.  I 
believe  that  these  are  Inherent  problems  with  distributed 
implementation*  but  that*  with  the  suitable  use  of  encryp¬ 
tion*  such  systems  can  still  be  acceptably  secure* 


3.4.2  syasiuhifi 


Addressing 


Carl  Sunshine 
RAND  Corporation 

Any  discussion  of  addressing  must  start  by  making  a  clear 
distinction  between  NAMES  (who)*  ADDRESSES  (where)*  and 
ROUTES  (how  to  qet  there)*  on  which  John  Shoch  of  Xerox  PAPC 
has  written  an  excellent  note*  CSHOC  78] 

several  key  concepts  or  capabilities  must  be  Included  In  a 
good  distributed  IDC  system.  These  Include  generic  naming* 
location  Independence*  request  for  service*  source  routing* 
and  extensibility.  Each  will  be  described  separately  In  the 
following  paragraphs*  althouah  there  are  clearly  some 
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relationships  between  them. 

Generic  naming  Is  the  ability  to  request  communl ca 1 1  on  from 
a  service  without  specifying  the  exact  process  that  will 
provide  the  service.  This  Is  normally  useful  when  multiple 
Instances  of  a  process  providing  the  desired  service  are 
available.  A  soeclflc  process  Is  selected  (or  created)  at 
the  time  of  the  Initial  request*  and  bound  to  the  source  for 
the  duration  of  the  Interaction.  This  binding  may  reaulre 
transmitting  the  specific  process  ID  to  the  source*  or 
merely  keeping  It  at  the  destination.  The  classic  example 
of  this  facility  Is  a  timesharing  login  service. 

Location  Independence  Is  the  ability  to  request  communica¬ 
tion  with  a  process  by  name  without  knowing  Its  location  or 
address.  Since  the  source  user  does  not  supply  the  address* 
It  must  be  found  by  the  IPC  system  In  some  directory.  Such 
name-to-address  directories  may  be  maintained  at  sources*  at 
a  central  server*  or  at  destinations  (the  names  are  normally 
handled  at  the  source*  with  the  consequent  need  to  change 
all  tables  whenever  a  host  address  or  name  changes  or  Is  ad¬ 
ded;  IBM’s  SNA  centralizes  lookup  In  the  SSCP*  and  the  Ir¬ 
vine  DCS  kept  name  tables  In  destination  machines*  requiring 
broadcast  of  requests  to  be  reccqnlzed  by  the  appropriate 
destination.  The  ARPA  Internet  Name  Server  proposed  by  Jon 
Postel  In  a  recent  note  Is  another  centralized  example.  A 
major  feature  of  location  Independence  Is  the  ability  for  a 
named  process  to  move  to  a  different  location  without  Its 
users  knowledge.  (Of  course  the  directories  must  be  up¬ 
dated.  ) 

Request  for  service  Is  the  ability  to  broadcast  a  request 
for  service  to  an  unknown  (to  the  source)  number  of 
potential  providers  of  the  service*  who  return  bids  to  per¬ 
form  the  requested  service*  thereby  Identifying  themselves. 
This  Is  similar  to  generic  naming*  but  Includes  facilities 
for  the  source  to  select  among  multiple  bios.  Such  a 
facility  was  Implemented  In  the  Irvine  DCS. 

Source  routing  Is  the  ability  for  the  source  to  Identify  the 
destination  by  specifying  a  route  to  It.  This  Is  necessary 
In  loosely  concatenated  systems  where  no  qlobal  address 
space  exists.  The  route  Is  given  In  terms  of  a  sequence  of 
addresses  through  successive  switching  points  or  systems 
which  each  have  Independent  address  spaces.  Hence  this 
concept  Is  also  called  path  addressing.  Disadvantages  are 
the  need  for  the  source  to  maintain  connectivity  In¬ 
formation*  and  the  variation  of  a  given  destination’s  "name" 
(consisting  of  the  route)  depending  on  the  location  of  the 
source. 

Extensibility  Is  the  ability  to  add  new  users  (addresses)  to 
the  system.  To  add  new  users  at  an  exlstlna  level  of  the 
address  space*  sufficient  room  must  be  available  In  address 
fields*  or  they  must  be  extensible.  Adding  additional 
layers  of  addressing  often  proves  a  Mogcr  proolem*  for 
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example  replacing  a  user  by  a  network  of  many  users.  If  the 
hierarchy  Is  fixed  (e.g.t  <ne t/ l oca l>) *  then  the  bottom 
"leaves"  of  the  addressing  tree  cannot  be  replaced  by  sub¬ 
trees.  In  this  case*  addressing  must  be  used  to  deal  with 
networks  outside  the  fixed  hierarchy.  This  Is  a  serious 
problem  with  attachment  of  private  networks  to  public  data 
networks. 

Interconnecting  two  previously  Independent  systems  Is  an  Im¬ 
portant  subcase  of  extensibility.  All  the  users  of  one 
system  can  be  given  new  addresses  In  the  other  system  If 
such  widespread  changes  are  acceptable.  Alternatively*  some 
unused  local  addresses  In  each  of  the  systems  may  be  mapped 
Into  addresses  In  the  other  system  If  only  a  limited  number 
of  users  must  be  accessable.  Finally*  If  the  addressing 
hierarchy  Is  extensible*  one  system  can  be  attached  as  a 
subtree  of  the  other*  or  both  can  be  made  subtrees  of  a 
higher  level. 


3. a. 3  Gordon 

Addressing  &  Security 
by 

Robert  L.  Gordon 
PRIME  Computers 

An  extremely  Important  aspect  of  Interprocess  communication 
Is  the  scheme  used  for  addressing  and  naming  the  processes 
and  communication  paths  used.  The  Importance  of  this  sub¬ 
ject  stems  from  the  fact  that  In  any  addressing  scheme 
protection  and  control  mechanisms  are  explicitly  or  Im¬ 
plicitly  present  and  either  aid  or  hinder  the  users  ability 
to  share  objects.  Many  current  systems  have  Inadecuate 
facilities  for  Identifying  names  and  controlling  access  to 
the  processes  within  the  same  host*  let  alone  for  processes 
residlnq  on  other  hosts.  Part  of  the  problem  stems  from  an 
Inconsistent  view  of  the  relationship  between  the  names  and 
uses  of  files*  devices*  processes*  users*  mailboxes*  generic 
and  specific  system  services.  The  utility  of  abstracting 
many  of  the  above  objects  as  processes  has  increased  the  Im¬ 
portance  of  "process  naming"  and  "process  addressing"  In 
overall  system  design.  Therefore  until  those  basic  Issues 
are  settled  the  design  of  specific  Interprocess  communica¬ 
tion  primitives  Is  difficult  since  they  cannot  focus  on  the 
fundamental  objects  that  they  will  be  dealing  with. 
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Fault  Tolerance  &  Security 
by 

Robert  L.  Gordon 
PRIME  Computers 

Any  communication  Is  Inherently  an  error  prone  process  due 
to  both  the  natural  distortion  of  the  medium  and  the  contex¬ 
tual  requirements  needed  for  Interpreting  the  transmitted 
message.  In  attempting  to  design  robust  Interprocess  com¬ 
munication  primitives  one  of  the  more  difficult  tasks  Is  the 
defining  and  handling  of  the  many  (natural)  errors  that  can 
occur.  Control  of  communication  mechanisms  between  proces¬ 
ses  fundamentally  depends  on  how  the  designer  envisions 
process  relationships.  If  process  relationships  are  tree 
structured*  then  the  status  and  control  of  a  processes*  com¬ 
munication  with  other  processes  might  be  monitored  and 
controlled  by  the  parent.  On  the  other  hand  If  each  process 
wants  to  maintain  the  concept  of  sovereignty  then  the  basic 
challenge  Is  either  how  to  provide  the  ability  for  cooperat¬ 
ing  processes  to  establish  a  monitor  process  that  is  capable 
of  controlling  the  communication  paths  between  the  proces¬ 
ses*  or  how  to  build  Into  the  communication  primitives 
mechanisms  for  the  detection  of  and  recovery  from  errors. 
Since  error  recovery  must  make  assumptions  about  lines  of 
authority  and  responsibility  between  system  components*  many 
of  the  Issues  associated  with  system  security  are  pertinent 
to  this  discussion. 


3.4.4  Chetson 


I  PC  Opinions 
by 

G.  L.  Chesson 
Bell  Laboratories 


P££££Si  Stamina 

Process  names*  file  names*  and  I/O  stream  names  should 
reside  In  the  same  name  space.  This  avoids  the  tyranny  of 
the  "access  method"  and  attendant  tronlems  of  making  a 
program  that  can  "talk"  to  anything  In  a  system.  One  can 
allow  process  names  to  be  passed  Into  processes  In  the  same 
way  that  file  names  and  I/O  streams  are  passed  around*  and 
this  In  turn  permits  progress  toward  Interactive  command 
processors  that  can  set  up  graph-like  structures  of  proces¬ 
ses*  tile  I/O*  and  IPC  streams. 


tifinrfiUfiiiiailsn  £i  £J££J2£Q±SlJIL 
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A  philosophy  that  has  been  proven  many  times  over  In 
language  design  may  be  stated  as  follows!  It  Is  "bad"  to 
provide  more  than  one  mechanism  for  a  particular  operation 
or  function.  This  Is  a  roundabout  way  of  saying  that  there 
are  benefits  to  be  gained  by  providing  a  single  IRC 
mechanism  for  use  by  "local"  processes*  i.e.  on  the  same 
machine*  and  "remote"  processes  on  different  machines. 


T^a n s£0£t.  Ne£han1.£m 

It  is  fine  to  use  shared  objects  (memory*  files)  for 
Interprocess  communication*  but  It  Is  important  to  hide  this 
fact.  The  reason  is  that  explicit  sharing  of  objects  is  not 
portable  with  respect  to  different  machine  and  operating 
system  a r c h i t e c t u re s  and  should  be  considered  a  local  op¬ 
timization.  Thus*  IPC  primitives  at  the  compiler  or  operat¬ 
ing  system  level  should  appear  as  I/0-llke  interfaces  that 
imply  copying  of  data  even  if  they  do  not  actually  copy  data 
on  some  systems. 


1££.  in  ££°2£iH!l!!lAl!£ 

Most  I D C  proposals  for  inclusion  in  programming  languages 
amount  to  little  more  than  interfaces  to  subroutine 
libraries  which  a)  cannot  be  Inherited  by  processes  across 
process  fork  operations*  b>  belong  in  the  operating  system 
anyway*  and  c)  were  done  better  by  “urrouohs  Corp  in  DCALGOl 
1"  years  a . :  o .  The  result  of  adding  IPC  to  a  language  Is 
analogous  .me  about  as  useful  as  the  notion  of  a  file  system 
in  Pascal.  representation  of  the  fundamentals  of  I°C  that 
belongs  more  to  the  oroqramming  language  realm  than  the 
operating  system  realm  has  yet  to  bo  demonstrated*  and  w  o  u  l  3 
fill  a  much-neeaeo  cap. 


tl££SlSiii££ 

There  are  applications  for  which  IPC  bandwldths  must  ap¬ 
proach  or  exceed  disk  speeds.  It  is  clear  that  such  per¬ 
formance  cannot  be  obtained  with  software  (or  even  firmware) 
alone.  Althouqh  there  may  not  be  much  interest  In  this  sort 
of  thing  at  the  IPC  workshop*  I  have  been  working  toward 
hardware  and  firmware  Implementations  of  my  software 
mechanisms. 


LU*  k2Ql£2i 

Ipc  mechanism  need  flow  control.  It  is  better  to  have  a 
scheme  where  the  sender  selfblocks  than  schemes  which  depend 
on  "stoo"  messages  from  the  receiver.  For  most  applications 
the  scheme  used  in  UNIX  for  pipes  and  other  things  would 
seem  to  work  well!  the  sender  clocks  (sleeps)  on  a  aueue 
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Length  uddcc  Limit  and  Is  awakened  when  the  aueue  drains 
below  a  Lower  Limit.  There  exists  a  timeout  call  which  can 
wake  the  writer  if  the  queue  drains  too  slowly  or  Is  other¬ 
wise  delayed.  An  additional  non-blocking  mechanism  has  been 
built  into  the  mpx  software  (see  section  7.7)  which  is 
useful  In  those  few  cases  where  blocking  cannot  be  tolerated 
--  network  servers  and  the  like.  This  avoids  the  problems 
that  occur  with  varying  process  and  communication  delays  or 
loss  of  control  messages. 


Cognoscientl  agree  that  me s s a g e -p a s s 1 n g  IPC  schemes  are 
equivalent  In  "power"  to  schemes  which  employ  shared  objects 
although  the  message  schemes  seem  "harder".  This  has  not 
been  proved  or  disproved  mathematically*  although  there  is 
substantial  empirical  evidence  that  pairs  of  processes  can 
be  synchronized  by  exchanging  messages. 


Food  l£r  Ihoyahi 

I  submit  that  it  Is  seductively  easy  to  synchronize  process 
pairs*  but  that  strategies  are  needed  for  synchronizing 
groups  of  processes  In  various  ways.  Is  it  reasonable  to 
set  up  "overseer"  processes  that  arbitrate  and  synchronize 
things*  or  are  there  better  ways  that  can  be  proven  correct? 
For  some  things*  like  c a l l -p r o c e s S 1 ng  in  my  network  I  use 
overseer  processes  because  they  reduce  complexity  and  can  be 
made  reasonably  efficient.  For  other  things*  like  synch¬ 
ronizing  a  process  group  carrying  out  a  parallel  com¬ 
putation*  I  would  try  to  eliminate  the  Deus  ex  machlna  and 
use  direct  process  to  process  methods. 


Po£  t_£b_H  1_t_x 

It  Is  Important  to  demonstrate  unlveral  IPC  ideas  and  to 
distinguish  local  optimizations  and  special  cases  within  the 
universal  model.  One  would  hope  that  a  suitable  IPC  model 
could  be  used  with  protable  operating  system  ideas  to  bring 
up  compatible  IDC  mechanisms  on  dissimilar  machines.  Sec¬ 
tion  7.7  on  Data  Communications  Software  outlines  some  ideas 
that  have  been  partially  demonstrated  to  have  portability 
oropert ies . 
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SECTION  4 

INTERPROCESS  SYNCHRONIZATION 


4.1  SUU1A&X  atEflfil 


4.1.1  statement  gf  the  Pcflbieffl 

1)  Synchronization  via  explicit  communication  (messages). 

2)  No  global  memory. 

3)  System-wide  control  with  only  inaccurate/incomplete  in¬ 
formation  on  the  system  state*  without  any  centralized 
procedure*  data  or  hardware. 

A)  Transit  delays  are:  variable*  unpredictable*  unboun¬ 
ded. 

5)  Loss*  error*  desequencing*  duplicate. 

6)  Other  failures  (processors). 


4.1.2  sgiuiian  £&£££ 


SflUillflU  £PA£t 
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GENERAL  CONFIGURATION  (LOGICAL) 
FOR  A  SINGLE  SET  OF  MESSAGES 
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1)  Distributed  service. 

2)  Survive  sende r / r e c e 1 ve r  failures. 

3)  N on-technical  reasons. 

4)  Modularity  (growth*  •••)• 

5)  Der  f o r mance s  . 

CONFIGURATIONS: 

a)  "Single  Sender  /  Single  Receiver" 

Single  Path  Signalling 
End-to-end  Synchronization 

(Used  to  achieve  flow  control  for  example) 


b)  Single  Sender  /  Multiple  Receivers 
Multiple  Path  Signalling 


MESSAGE 

1  1 

I  IOEN.  | 

1 

1 

1 

2 

CONTENT 

1  1 
|  DIFF.  | 

3 

1 

1 

4 

PROCESSING  AT 
RECEIVERS 


IDEN< 


DIFF. 


(1)  Pure  broadcasting  In  a 

(2)  Pure  broadcasting  In  a 
data  base. 

(3)  Transaction  processing 
(replicated?)  system. 

(4)  Transaction  processing 
replicated  data  base. 


fully  replicated  system, 
heterogeneous  replicated 

In  a  homogenous 

In  a  heterogeneous 


OBJECTIVE:  To  maintain  a  unique  ordering  of  Incoming 

messages  for  all  receivers  (whether  Initially 
fortuitous  or  enforced). 
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c)  Multiple  Sender*  /  Single  Receiver 

Multiple  Path  Signalling 

OPJECTIVE:  Reveal/Cause/Express  relationships  between 

Incoming  messages  belonging  to  different  flows* 

d)  Multiple  Senders  /  Multiple  Receiver* 


Multiple  Path  Signalling 

1 ) 

Fully  replicated  systems 
same  objective  as  (b) 

2) 

Portioned  systems 
same  objective  as  (c) 

3) 

Mi  xed  systems 

same  objective  as  (b>  for 

dynamically  changing 

subsets  of  receivers  plus 
(c  ) 

the  same  objective  as 

4.1.3  tiliilna  Saliiiiana 

a)  Logical  Clocks!  L.  Lamport 

To  ImDtement  a  sequential  (T.  Ord.)  processing  In  a 
distributed  manner  (each  process  has  an  Image  of  "The 
Waitlnq  Queue")  -  may  be  used  to  achieve  mutual  ex- 
c lus Ion. 

b>  Physical  Clocks!  L.  Lamport 

How  to  Implement  logical  clocks  on  a  set  of  physical 
clocks  (unique  physical  time  frame). 
c>  Logical  Clocks  plus  Voting!  R.  Thomas 

How  to  resolve  conflicts  between 

s 1  mu 1 1 aneous/concur r en t  processes  competing  for 
Identical  resources  (fully  replicated  systems). 

d)  Eventcounts*  Sequencers!  Reed/Kanodla 

To  observe  (READ*  AWAIT)  or  to  express  the  occurence  of 
some  event  (ADVANCE)  -  to  serialize  events. 

e)  Circulating  Token!  G.  Le  Lann 

-  Uithout  tickets 

To  achieve  mutual  exclusion. 

-  With  tickets 

To  serialize*  to  express  relationships 
between  event  s 

f)  Some  "naive"  or  less  general  solutions! 

-  Shared  Variables!  E.  Dljkstra 

-  monitors  and  Messages!  P.  Brlnch-Hansen 


Georgia  Institute  of  Technology 


IPC  Workshop 


Section  4 


INTERPROCESS  SYNCHRONIZATION 


Page  33 


AltrJJmlft* 

a)  Response  time. 

b)  Overheads  (traffic,  processing,  storage). 

c)  Extensibility  (Is  full  connectivity  required,  global 
knowledge  of  the  system  status.  •••)• 

d)  Deterministic  synchronization  /  probabilistic  synch¬ 
ronization  /  convergence. 

e)  fault  tolerance. 

-  Detection. 

-  Recovery. 

f)  Simplicity  (correctness  proving.  1 mp lement ab 1 l 1 t y 
headaches.  .  •  •  )  • 


4.1.5  01b££ 

a)  Effects  of  probabilistic  synchronization. 

b)  System  considerations: 

-  Hard/soft  partitioning. 

Application  processing  /  system  processing 
part lonlng. 

c)  Evaluation  of  solutions  with  respect  to 

-  Attribute  space. 

-  Problem  space. 

d)  Policies  (fairness,  enforced  priorities). 

e)  Adequacy  to  resource  management. 

f)  Classification  of  solutions. 
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4.2  EflSlUflM  PAPERS 


4.2.1  Ley 


Interprocess  Synchronization 
by 

Edward  Y.  S.  Lee 

TRU  Defense  and  Space  Systea  Group 

My  Interest  In  IPC  Is  mainly  connected  with  update  synch¬ 
ronization  In  redundant  distributed  data  bases  (DDB).  The 
protocols  developed  for  IPC  must  be  viable  and  be  able  to 
satisfy  the  following  major  requirements  for  DDF!  operations: 

1)  Performance  (response  time) 

2)  Efficiency 

3)  Deadlock  prevention 

4)  Error  recovery  (surviving  errors  and  faults 
and  continue  operation) 

5)  Security 


Recent  state-of-the-art  developments  In  this  area  can  be 
divided  In  two  major  categories: 

1)  Protocols  associated  with  a  centralized 
control  approach  CALSB  76*  BADA  78.  ELLI  77. 

ESUA  76.  ROTH  773 

2)  Protocols  relylrg  on  distributed  control 
CGRAP  76.  JOHN  75.  ROTH  77.  STON  78,  THOM  773 


However,  most  of  the  Drotocols  do  not  Include  serious 
considerations  of  Interprocessor  communication,  but  rather 
take  the  approach  that  some  kind  of  messages  can  be  passed 
among  the  distributed  processors  for  communication  and  let 
someone  else  to  worry  about  It. 

There  are  considerable  difficulties  In  taking  this  kind  of 
approach  In  a  loosely  coupled  distributed  system.  Because 
IPC  Is  the  life  line  of  the  system.  It  Is  needed  for  the 
distributed  control  (operating  system).  distributed  data 
base  operation,  recovery  of  the  system  as  well  as  the  DOB 
under  fall-soft  and  fall-safe  condition,  and  reconfiguration 
of  the  network  when  one  or  more  processors  are  disabled. 
All  these  essential  functions  of  a  distributed  system  demand 
efficient  and  fall-safe  IPC  mechanisms. 
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The  second  obstacle  Is  the  lack  of  evaluation  criteria  and 
methodologies  to  test  and  measure: 

1)  Performance 

2)  Efficiency 

3)  Validity 

4)  Verflablllty 

of  any  protocol  that  Is  being  proposed  as  the  best  protocol 
for  008.  There  are  some  efforts  present  In  this  area  CGARC 
78«  SUNS  763*  but  a  lot  more  work  will  be  required. 

In  a  practical  system*  It  Is  very  likely  that  a  mix  of 
several  protocols  will  be  used  for  updating  redundant 
distributed  data  bases  depending  on  the  specific  situation 
and  requirement.  However*  It  should  be  possible  to  have  a 
unified  approach  to  IPC  for  all  protocols.  Additional 
research  In  this  area  Is  needed. 
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SECTION  5 

MECHANISMS  -  IMPLEMENTATION*  UTILIZATION*  and  PERFORMANCE 


5.1  imiM  gaauE  subma&i  report 


Ifliscsslina  Hal  Uiaauaafia 

Data  Interface  to  program  not  resolved 
Control  Interface  to  program 
"To  poll  or  not  to  poll" 

Events*  Interrupts*  on- condl t 1 ons 


Mechanisms 


Signals 
Events 
Semaphores 
Shared  Memory 
Mon  1  tors 
Messaqe  Queues 
Pipes 
Ports 

Pull  Duplex  Streams 
Virtual  Procedure  Calls 
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Characteristic*  &1  Ul£  BttfalQlMl 
SHARED  OBJECTS 

I 

|  EXPLICIT  DATA  MOVEMENT 

I  I 

|  |  EVENT  OPERATING  BY 

I  I  t 

|  |  |  PROCESS  CREATION 

|  |  |  |  SIDE  EFFECTS 

till 

(III  EASE  OF  DISTRIBUTED 
(till  IMPLEMENTATION 

I  I  I  I  I 


— T — 

I 

_ T _ 

_ T _ 

Signals 

i  U 

1 

N 

1 

na  | 
i 

N 

♦ 

Events 

1 

1  U 

N 

i 

na  | 

N 

♦ 

Semapliores 

1 

1  S 

1  _ 

N 

1 

na  | 

N 

m 

Shared  Memory 

1 

1  S 

1 

N 

1 

S/R  | 

N 

«• 

Monitors 

1 

1  S 

1 

Y 

1 

R  | 

N 

0 

Message  Queues 

1 

|  S/U 

1  _ 

Y 

1 

S/R/T | 

Y 

♦ 

Pipes 

1 

1  U 

1  _  _ 

Y 

1 

na  | 

N 

♦ 

Ports 

1 

|  S/U 

1 

Y 

1 

na  | 

1 

N 

♦ 

Full  Duplex  Streams 

1 

1  U 

Y 

1 

R  | 

N 

♦ 

1  1 

Virtual  Procedure  Calls  |  U  | 

1  1 

Y 

1 

T  | 

Y 

♦ 

S  =  Shared  S  =  Sender 

U  =  Unshared  R  =  Receiver 

T  =  Transport 

Mechanism 

na  =  not  applicable 
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fi££i££&i£  £Udliii£A  Si  &££ll£nlia& 

Performance 

Bandwl dth 
Delay 

Provab 1 1 1 1  y 

Correctness  of  use 
Correctness  of  Implementation 
Securl ty 
T  ransparency 
Naming 

Location  (Physical) 

Environment  (Logical) 

Separation  of  control  from  data 
Complete  and  small  set  of  primitives 
Fault  tolerance 

Encapsulat Ion 

Detection 

Recovery 

Size  of  fault  set  covered 

NOTES:  The  priorities  used  to  weight  these  desirable 

quail  ties 
depend  on: 

-  Application 

-  Level 

-  Environment 
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Qiil£ibl8  flyiilllti  £l  flidunlui 


Capabl 1 1  tie*-- 1 
Fault  Set  Covered--|  I 

Error  Recovery--!  j  j 

Error  Detection--!  j  |  j 

Encapsulation-- |  III  I 

Primitive  Coapleteness/SI ze— |  1111  I 

Control/Data  Separation--!  j  j  j  j  j  j 

Transparency  (Environment >-- |  |  j  |  I  j  |  j 

Transparency  (Location)--!  1111111  I 

Transparency  (Na*1ng)--|  j  1111111  I 

Security— |  |  |  I  I  I  I  I  I  I  I 

Provability— |  |  |  |  I  I  I  I  I  I  I  I 

Perforaance— |  III  I  I  I  I  I  I  I  I  I 

I  I  I  I  I  I  I  I  I  I  I  I  I 

JT _ T _ Y _ T _ Y _ T _ Y _ T _ Y _ Y _ Y _ Y _ Y_ 


Signals 


Events 


Semaphores 


Shared 

Menory 

Monitors 


Message 

Queues 

Pipes 


Ports 

Full  Duplex 

Streaas 

Virtual 

Procedure 

Calls 


AD  =  Addressing  C  =  Control  only  C  =  Control 

Mechanism  D  =  Data 

Dependent 
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££jsi£nl£  fin  ttfififianlfijs  fluaililfii 

1)  A  functionally  complete  IPC 
mechanism  requires  both  data  and 
control  capabilities 

2)  All  were  considered  to  be  "basic" 
mechanisms  ->  No  embellishments  to 
Improve  desirable  programs 

3)  Thus  ability  to  recover  from  faults 
depends  on  Implementation 

A)  Another  trade  -  Bandwidth  vs. 

status  consistency 

5)  Perceived  hierarchy  (In  mechanism 
list) 

6)  Omissions 

-  Broadcasts 

-  Addressing 

-  IPC  mechanisms  ?? 

7)  A  design  exercise  to  try  to  over¬ 
come  "-*s"  In  table  would  be 
Interesting  ---  Also  table  comple¬ 
tion 


1)  Migration  of  applications  from 

centralized  to  distributed  en- 

v 1 r onment 

2)  Not  enough  known  about  these 

mechanisms : 

-  Complexity  of  IMPL 

-  Size  of  IMPL 

-  Efficiency  of  IMPL 


-  Useful  hardware 

assists 

3) 

Common  understanding  of 

all 

mechanisms 
-  Diet  ionary 

4) 

Lack  of  a  number 

of  Implementations 

5) 

Cost  /  time  /  complexity 

6) 

Premature  standardization 

7) 

Difficulty  of 

modi f  y 1 nq 

/ 

ex- 

per Iment 1 ng  with 

hardware 

support 

devices 

8 ) 

Premature  vendor 

mechanism 

selec- 

1 1  on 

9) 

Compat 1 b 1 1 1 1  y 

-  Obstacle 

-  Objective 

10) 

Evaluation  criteria 

11) 

Papers  don’t  tell  reasons 

for 

designs  (some  designs  based 

on 

few 

examples  ) 

12) 

Definitions  of  universes 
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Research  Questions! 

1)  Identify  collections  of  primitives 
for 

-  Easy  programmer  understanding 

-  Efficiency 

-  Match  to  application 

(Answer  probably  depends  on  en¬ 
vironment) 

2)  Fault  Tolerance  of  IDC  mechanisms 
not  well  understood 

.1)  Trade  --  User  or  IPC  mechanism? 

A)  How  much  must  user  be  aware  of 
process  creation/existence? 

5)  How  should  responsibility  be 

distributed?  Visibility  of  fault 
respons Iblllty. 

k)  How  to  decouple  bindings! 

-  Modules  to  graph 

-  Process  to  nodes 

-  Resources  to  processes 

7)  What  set  of  IPC  mechanisms  Is 

-  Easy  to  use 

-  Complete 

-  Efficient 

R )  Refine  virtual  procedure  call 

mechanism. 

P)  Tools  for  top-down  design 

10)  How  to  select  architectures  from 
option  criteria 

11)  How  to  decompose  applications 
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5.2  AflEklEIlttS  flAIEEIAL 


5.2.1  £ctafl££d  fcl  iil£  Marking  Sroup 

An  attempt  was  made  to  define  "a  set  of  primitives  that  al¬ 
lows  an  application  software  engineer  to  design  the  best 
solution  for  his  problem."  It  was  quickly  realized  that 
this  Is  not  an  easy  task.  Some  of  the  Issues  Involved  are: 

1)  Some  applications  require  highly  reliable 
IPC»  while  In  others.  communicated  Informa¬ 
tion  becomes  useless  after  a  certain  period 
of  time.  A  single  set  of  primitives  to  Im¬ 
plement  1 P C  may  not  solve  both  types  of 
problems. 

2)  Should  IPC  primitives  be  operating  system 
services  or  should  IPC  constructs  be  parts  of 
various  programming  languages?  A  relevant 
reference  to  this  latter  proposal  may  be 
found  In  [HOAR  7ft]. 


At  this  point.  It  was  felt  that  It  was  necessary  to  outline 
the  hierarchy  of  levels  at  which  IPC  mechanisms  can  be  In¬ 
voked.  For  each  level,  we  attempted  to  describe  those  ob¬ 
jects  which  may  be  manipulated  and  those  IPC  operations 
which  may  be  performed  on  each  object.  If  any. 

H  1_e£a r c h/  of.  Levels 

Command  Level 
Hiqh  Level  Languages 
Operating  System 
Instruction  Level 
microcode  Level 
Hardware  Level 


The  description  of  objects  and  IPC  operations  can  be 
enumerated  for  three  different  situations! 

1)  Accepted  practice  -  those  commercially 

aval  lable 

?.)  State  of  the  art  -  current  practices  of 

researchers  In  the  field 

3)  Wish  list 


[flume  ration  2!  Quant  1 1 1  e  §  f.££  A  ££e£,£ed  Practice 
ZQM2Q4  k£l£i: 
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objects  -  process*  file*  link*  device*  program* 
task  graph*  directory 

IPC  operations  - 

files!  file  locks  (control  function) 
pipes 

processes!  create 
delete 

link  via  a  pipe 

suspend 

resume 

status 

links:  creation 

temporary  files 

link  management  in  DEMOS 


Reference:  CBASK  77]. 


Note!  Though  not  all  types  of  objects  are  availa  e  on  many 
systems*  some  of  them  can  be  used  to  emulate  those 
capabilities  which  are  unavailable.  For  example*  tem¬ 
porary  files  are  used  in  UNIX  to  emulate  pipes. 


Hiah  Le ye  l  Language^  : 

objects  -  typed  objects  (integers*  reals*  characters*  etc.) 
semaphore 
mon i tors 
events 
ports 

shared  common  (typed  objects) 

Except  for  the  use  of  shard  typed  objects  (via  global  com¬ 
mon  areas)*  current  languages  commonly  available  do  not  use 
the  other  objects  for  IPC  (e.g.*  PL/I).  Almost  invariably* 
one  must  drop  into  a  runtime  library  routine  or  to  the 
operating  system  to  perform  IPC  functions. 

PL/I  is  most  progressive 

Algol  68  provides  some  capabilities 

APL  supports  shared  variables 

Miscellaneous  noJte$: 

There  was  some  discussion  concerning  the  two  types  of  com¬ 
monly  used  IPC  mechanisms:  message-oriented  vs.  procedure- 
oriented  (monitor).  A  good  reference  to  this  area  is  tLAUE 
79]. 
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5.2.2  Ecsaacfid  kx  EgcElsi 


5. 2.2.1  Introduction  and  Explanation 

The  IPC  mechanisms  described  here  are  known  as  "primitive" 
for  several  reasons?  they  are  primitive  In  the  sense  that 
they  are  low-level  bullalng  blocks  from  which  more 
sophisticated  forms  of  IPC  can  be  built*  they  are  mostly 
oriented  towards  two-party  communication*  the  simplest  case* 
and  they  are  mostly  derived  from  existing  uniprocessor 
systems. 


5. 2. 2. 2  Desirable  Properties 

It  Is  fairly  easy  to  list  some  desirable  properties  that  any 
Interprocess  communication  mechnlsms  should  have? 

Pe r f o rmanc e  --  In  terms  of  bandwidth  and  also 
delay.  We  would  Like  mechanisms  with  a 
minimum  of  overhead*  In  order  to  maximize 
performance.  THls  should  not*  of  course* 
reduce  funct lonal 1 ty. 

Pr^vabiilix  "  A  desirable  property  for  any  IPC 
mechanism  should  be  that  It  lend  Itself  to 
the  verification  of  systems  which  are  built 
up  using  processes. 

Securi {y  --  By  this  we  mean  protection  of  two  com¬ 
municating  parties  from  one  another*  and  also 
with  respect  to  third  parties*  In  terms  of 
leakage  and  Interference. 

l£k!l£Ek£kIl£X  --  This  refers  back  to  the  issues  of 
naming  and  location.  The  users  of  an 
interprocess  communication  mechanism  sdhould 
not  have  to  deal  with  that  mechnlsm  at  other 
than  the  advertised  level*  nor  should  they 
have  to  be  aware  of  the  details  of  its  1m- 
plementat Ion. 

Separation  of,  Da£a  and  Confr  rpl  --  It  may  or  may 
not  be  a  good  property  of  an  IPC  mechanism  to 
contain  elements  of  both  data  and  control. 

In  some  Implementations*  data  and  control 
(signal)  transfer  from  sender  to  receiver  are 
carried  out  In  the  same  operation.  Separate 
data  and  control  transfer  operations  can*  of 
course*  be  combined  In  higher-level  non- 
primitive  Interprocess  communication 

operations. 

£2!!l£l£i£Il£a5.  ink  Sing  1,  Ine ss  --  Interprocess  com- 
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munlcatlon  primitives  should  certainly  be 
complete*  In  the  sense  that  one  should  be 
able  to  do  any  operation  which  Is  valid  In 
the  given  system  without  Introducing  new 
primitive  operations.  It  Is  not  so  clear 
that  they  shoutd  be  small*  consistent*  of 
course*  with  performance. 

Fault  l2i££112£.£  --This  leads  to  the  concepts  of 
encapsulation  and  recovery.  In  order  to 
achieve  fault  tolerance*  an  operation  should 
fulfill  the  following  conditions: 

-  faults  should  be  detected. 

-  faults  should  be  handled  at  the 

appropriate  level*  and  not  simply 
passed  back  upwards  towards  the 
user. 

-  faults  generated  at  a  lower  level 

should  not  terminate  a  user  ses¬ 
sion.  Instead*  they  should  be 
recovered  at  a  level  close  to 
that  at  which  they  occurred. 

-  In  Interprocess  communication*  If 

data  or  control  transfer  falls* 

It  may  be  sufficient  to  Inform 
the  sender*  or*  In  some  critical 
applications*  It  may  be  necessary 
to  Inform  both  the  sender  and  the 
receiver  that  some  message  or 
control  signal  did  not  get 
through. 

The  concept  of  e n c ajjsi.j.la.t  1_on  suggestes  the 
enforced  localization  of  errors*  so  that  an 
error  In  the  communication  between  two  proc- 
cessors  can  have  no  effect  on  any  others. 

The  concept  of  £££2yer^  suggests  that 
whatever  errors  do  not  occur  are  cleaned  up 
In  such  a  way  that  a  consistent  system  state 
Is  restored*  and  that  unresolved  error  states 
are  not  slmpty  passed  up  the  line.  Error 
messages  of  the  form: 

SUBNETWORK  ERROR  -  PLEASE  LOG  IN  AGAIN 
should  never  occur. 

““  The  concept  of  cost  Is  very  difficult  to 
define  exhaustively*  but  one  can  suggest  some 
kinds  of  cost  which  can  be  Incurred: 

-  1 mp 1 ement a  t Ion 

-  performance 

-  application  flexibility 

Note  that  In  the  evaluation  of  primitive  mechanisms  given  In 
section  5.1  we  assume  a  fairly  standard  Implementation.  The 
properties  above  clearly  depend  In  part  on  Implementation 
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and  we  cannot  give  any  hard  and  fast  rules. 


5.2.2. 3  IPC  Taxonomy 

One  of  the  most  obvious  dimensions  along  which  to 
differentiate  I  PC  mechanism  Is  whether  they  are  message- 
based  or  not.  Mechanisms  can*  of  course*  be  da t a-t r»ns f e r 
based*  without  being  message-based. 

Examples:  Pipes*  ports*  full-duplex  streams. 

5. 2. 2. 3.1 

These  are  clearly  the  IPC  mechnlsms  favored  In  those 
distributed  systems  which  are  themselves  not  message-based. 
Instead  of  messages*  these  depend  on  a  variety  of  communica¬ 
tion  mechanisms: 

1 )  S 1 gna  l  s 

Signals  are  process  Interrupts*  which  can 
arrive  with  or  without  accompanying  type  In¬ 
formation*  and  perhaps  the  Identifier  of  the 
originator.  A  signal  may  cause  a  transfer  of 
control  Inside  the  receiver  process*  and 
there  may  be  enab l e /d 1 s ab l e  mechanisms* 
analogous  to  those  for  hardware  Interrupts. 

2)  Events 

An  Is  a  state  variable.  One  should  be 

able  to  test  It  and  set  It.  It  should  be 
possible  to  Implement  a  on  the  event  by 

means  of  a  test  In  a  loop. 

3)  Semaphores 

A  ?em^,pho Eg  Is  a  storage  cell  with  an  as¬ 
sociated  queue  of  processes*  and  with  two 
operations*  wa1_,£  and  s,l£na.l  (no  relation  to 
signals  In  section  3. 2. 1.1)  which  have  side 
effects. 

4 )  Shared  Memory 

Stl£££Sl  m£!Iia££  consists  of  data  cells  which 
are  accessible  to  sending  and  to  recevlna 
processes*  perhaps  with  an  associated  access 
discipline  which  Is  designed  to  avoid 
critical  section  problems  In  accessing  the 
shared  resource. 

5 )  Ports 

E££i£  are  Input/output  channels  belonging  to 
processes.  Ports  In  corresponding  processes 
can  be  connected  together  by  iink£  to  form 
c ommun 1 c c a 1 1  on  channels. 

6)  Full  Duplex  Streams 

A  fu^|  duo  lex  sire^m  Is  effectively  a  bi¬ 
directional  pipe.  In  place  of  a  sender  and 
receiver*  the  processes  at  either  end  of  the 
full-duplex  stream  can  both  send  and  receive. 
Naturally*  In  order  to  achieve  some  measure 
of  synchronization*  a  read  should  suspend 
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until  a  corresponding  write  Is  executed  at 
the  other  end  of  the  full  duplex  stream*  and 
vice  versa. 


5. 2.2. 3.2  Messaoe-ba  $ed  I  PC 

These  are  the  IPC  mechanisms  which  depend  on  messages 
between  processes.  They  can  be  further  subdivided  along  the 
following  lines: 

1)  Single  send  pi  — >  p2 

2)  Single  receive  pi  <—  p2 

3)  Multiple  send  pi  -->  subset  of  P 

4)  Multiple  receive  pi  <--  subset  of  P 

aiatkina  an d  Eiimiiiyaa 

A  further  way  of  subdividing  Interprocess  communication 
primitives  Is  on  the  basis  of  whether  they  are  blocking  or 
non-blocking  In  nature.  A  blocking  primitive  Is  one  which 
causes  Its  Invoking  process  to  be  suspended  until  the 
primitive  operation  Is  completed.  Thus*  after  Invoking  a 
blocking  receive*  a  process  will  suspend  (sleep)  until  some 
message  does  arrive. 

Distributed  systems  have  been  Implemented  with  blocking 
send/receive*  with  blocking  send  and  non-blocking  receive* 
and  with  non-blocking  send/receive. 

iiniual  P£2£.£^yre 

Virtual  procedure  calls  ccan  be  vlewd  as  a  highly  stylized 
form  of  message  passing  but  entail  a  great  deal  more 
semantics.  They  are  used  In  support  of  an  object  model  - 
processes  access  objects  and  objects  are  controlled  by  other 
processes.  IPC  consists  of  one  process  Invoking  a  function 
on  an  object  and  another  process  executing  that  function. 

5. 2. 2. 3. 3  Hlqher-lev^l  Mechanisms 

There  are  also  higher-level  mechanisms  which  can  be  produced 
using  the  primitive  operations  as  building  blocks.  For 
Instance*  one  frequently  encounters  virtual  circuits  built 
on  message  passing  combined  with  signalling. 


5. 2. 2. 4  References 

The  following  references  may  be  helpful  In  explaining  the 
specific  IPC  concepts  Identified: 

1)  Semaphores*  Signals*  Events*  Monitors*  Pipes: 

CHOLT  7  8b  I 

2)  Virtual  Procedure  Calls: 

CHAMI  nd] 

3)  Message  Passing  Operating  Systems: 

[MANN  771 
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4)  Message  Passing  versus  Procedure  Calls! 
CLAUE  793 
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5.3  Emma  eaeees 


5.3.1  Elifelftl 


PROGRAMMING  ISSUES 
by 

Richard  Peebles 
Digital  Equipment  Corporation 

Esiiaiflus  1&&U££ 

A  Programmer's  environment  (language*  operating  system  ser¬ 
vices  and  model  of  process  structure)  tends  to  be  a 
religious  Issue.  My  religion  calls  for  the  simplest  pos¬ 
sible  environment  by  providing  a  set  of  "orthogonal  basis 
vectors"  for  programming.  The  result  Is  a  set  of  primitives 
that  allows  an  application  software  engineer  to  design  the 
best  solution  for  his  problem.  Orthogonality  of  software 
tools  means  that  one  piece*  or  primitive*  does  not  preempt 
design  choices  for  the  others.  This  Is  to  be  contrasted 
with  another  approach  to  simplicity  which  preempts  almost 
all  choices. 

In  addition*  my  religion  calls  for  the  removal  of 
representational  Irrelevancles  to  the  highest  degree  pos¬ 
sible.  As  a  consequence*  the  underlying  process  structure 
Is  not  visible  at  all  to  most  programmers*  nor  Is  the 
distributed  nature  of  the  machine  that  Implements  his  ap¬ 
plication. 

Practical  Issues 

The  difficult  part  of  religion  Is  applylnq  It  to  our  dally 
lives.  Just  what  a££  these  primitives*  what  makes  an 
orthogonal  set*  can  we  find  a  set  of  "basis  vectors"? 
Furthermore*  can  we  reasonably  expect  to  hide  the  process 
and  machine  structure  from  programmers?  In  my  view*  most 
research  In  distributed  systems  Is  (should  be)  aimed  at  ans¬ 
wering  these  questions. 

£2££ljcalnl£  an  1E£  tt££ii£a.l£JE 

The  above  goals  for  the  programming  environment  Impose 
several  constraints  on  the  IPC  mechanism.  First  It  should 
be  location  Independent.  The  same  mechanism  should  be  used 
for  both  Inter-host  and  Intra-host  communication.  This 
means  that  a  programming  decision  does  not  preempt  a 
proc es s- 1 oc fct 1  on  decision  and  vice-versa.  A  more  difficult 
question  Is  whether  the  IPC  mechanism  should  be  visible  as 
such  to  the  programmer.  It  Is  possible  to  provide  him  with 
an  extended  machine  In  which  IPC  appears  as  the  application 
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of  an  operator  to  an  operand*  this  Is  the  approach  taken  In 
our  experimental  UEB  system.  It  Is  a  simple  matter  to 
construct  Doth  datagram  and  virtual  circuit  abstractions 
with  this  mechanism  If  "communicating  processes"  Is  a 
relevant  abstraction.  It  Is  considerably  more  difficult  to 
provide  the  operator/operand  abstraction  mechanism  than  a 
simple  send/receive  mechanism!  particularly  If  abstractions 
are  to  be  enforced. 

&iai£  Al  £ni 

In  vendor-implemented  products  neither  location  transparency 
nor  process  structure  transparency  is  usually  provided. 
Research  systems  have*  for  the  most  part*  made  IPC  an  ex¬ 
plicitly  separate  concept  among  other  abstract  extensions  of 
the  operating  system.  The  UEB  oper at  or- 1 nvoc at  1  on  architec¬ 
ture  Is  seeking  to  provide  a  single  mechanism  that  will  ser¬ 
ve  as  a  general  basis  for  "operating  system"  and  user  func¬ 
tions  -  they  are  not  distinguishable.  It  Is*  however*  only 
In  the  final  stages  of  design  -  about  to  be  Implemented. 

The  most  significant  obstacle  to  providing  an  IPC  mechanism 
that  least  perturbs  the  programming  Interface  Is  historical 
artifact.  Finding  a  design  that  Is  Ideal  and  that  allows 
reasonably  simple  migration  of  customer  applications  Is  a 
hard  problem.  Ue  may  be  forced  to  throw  up  our  hands  and 
call  on  users  to  swallow  yet  another  conversion  effort. 
Will  we  do  It  again  In  1988  when  distributed  systems  go  out 
of  vogue?  Hence  my  strong  belief  In  the  need  for  process 
and  machine  structure  Independence  of  IPC.  Early  standards 
will  be  a  hindrance  to  this  but  may  be  Inevitable  given  the 
state  of  the  art  and  user  Impatience  to  build.  If  that  Is 
accepted*  the  next  biggest  obstacles  are  thin  wires  and 
different  architectures.  Hiding  the  network  structure  Is 
hard  when  physical  links  are  under  100K  bps.  Then  too  there 
Is  the  problem  of  the  complexity  of  the  UEP  abstraction  ap¬ 
proach  -  1t*s  hard  to  understand. 
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5.3.2  MaUtQiloa 

PROGRAMMING  ISSUES  IN  DISTRIBUTED  SYSTEMS 

by 

Vlrg  Wallentlne 
Kansas  State  University 


P££b  l_em 

The  programmer  In  a  distributed  processing  environment  must 
be  provided  with  a  set  of  facilities  which  permit  easy 
specification  of  the  distributive  properties  of  his/her 
program.  The  word  program  here  Is  used  to  refer  to  either 
the  output  of  a  single  compilation  or  the  output  of  Indepen¬ 
dent  compilations  of  proqram  modules  which  are  to  be  com¬ 
municating  via  an  IPC.  These  distributive  properties 
Include  the  specification  of  the  concurrency*  data  flow* 
resource  requirements  (memory*  devices*  etc.)*  ana 
Intraprogram  ( 1 n t e r modu l e  )  protocol  properties  Inherent  In 
the  execution  of  a  configuration  (system)  of  cooperatlna 
software  modules.  Given  a  description  of  these  properties* 
an  operating  system  must  be  able  to  distribute  the  user’s 
program  across  multiple  machines  In  a  manner  which  Is 
transparent  to  the  programmer.  Traditional  approaches  to 
providing  these  facilities  Include  the  concurrency  support 
In  high-level  languages  and  the  resource  allocation  and 
concurrency  support  In  conventional  operating  systems. 

CyrrcnJt  ££fil£ache^ 

Several  high-level  languages  such  as  Concurrent  Pascal  [BRIN 
77]  and  SP/K  [HOLT  78]  have  Incorporated  the  monitor  [BRIN 
73]  [HOAR  74]  concent  to  provide  structured  concurrency. 
This  concent  Is  excellent  In  a  centralized  system  but  relies 
on  shared  data  (and  therefore  shared  memory)*  and  Is 
therefore  not  an  appropriate  concept  on  which  to  base  a 
distributed  system.  However*  an  effort  Is  underway  at  the 
National  Physical  Laboratory  [DOWS  78]  to  distribute  a 
Concurrent  Pascal  program  across  loosely  coupled 
microprocessors.  The  distribution  of  passive  system  com¬ 
ponents  (such  as  monitors)  on  disjoint  machines  Implies  many 
copy  operations  for  parameters  and  also  additional  active 
system  components  (processes)  which  do  not  appear  In  the 
proqram  text. 

A  much  more  appropriate  high-level  lanauage  concept  for 
distributed  programs  Is  proposed  by  C.A.R.  Hoare  In 
reference  [HOAR  78].  Bach  function  Is  a  sequential  process 
which  Is  connected  to  other  communicating  sequential  proces¬ 
ses  via  1 npu t /ou t pu t .  This  concurrency  support  Is  based  on 
data  flow  and  not  shared  data,  therefore*  It  Is  not  Depen¬ 
dent  on  shared  memory.  As  a  result*  each  function  Is 
distributable.  However*  It  seems  that  buffering  of  data 
between  processes  Is  necessary  to  Improve  performance  In 
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distributed  systems  with  slow  speed  connections.  Since  the 
compiler  for  such  a  language  presumably  can  generate  the 
resource  requirements  for  the  program*  since  processes  are 
Identified  by  name*  and  since  the  protocol  between  processes 
Is  fixed*  enough  knowledge  Is  available  to  distribute  a  set 
of  processes  which  are  compiled  together. 

A  second  area  of  programmer  concern  for  distribution  occurs 
because  concurrent  program  functions  (modules)  may  be 
separately  generated  (compiled).  These  may  well  be  existing 
programs  or  just  separate  functions  based  on  programming 
style.  The  Interconnection  of  these  modules  Into  a  program 
Is  dynamic  and  therefore  requires  operating  system  support. 
In  early  conventional  operating  systems*  the  support  for 
combining  these  functions  Into  a  configuration  of  com¬ 
municating  concurrent  software  functions  Is  specified  at 
three  levels.  First*  overlap  of  CPU  and  I/O  are  made 
available  for  standard  I/O  file  functions.  Second*  added 
concurrency  Is  achieved  only  with  unstructured  (low-level) 
facilities  for  process  creation*  naming*  and  communication. 
Third*  complex  job  control  languages  are  provided  to  achieve 
allocation  of  resources  to  run  these  functions.  In  a 
distributed  system*  these  JCL  steps  must  be  synchronized 
across  machines.  Complex  resource  control  In  a  distributed 
system  should  certainly  not  be  the  programmer's 
responsibility.  This  Is  alleviated  by  viewing  distributed 
operating  systems  and  their  executable  programs  as  cooperat¬ 
ing  processes.  A  highly  successful  system  Is  the 
Distributed  Computing  System  of  Farber  CFARB  733.  In  this 
system*  the  structure  and  distribution  of  the  set  of  proces¬ 
ses  Is  transparent  to  the  user,  and  a  high  level  of 
concurrency  Is  achieved  without  use  of  low-level  process 
control  primitives. 

Process  naming  of  cooperating  processes  Is  still  burdensome 
to  the  programmer.  The  same  problem  also  occurs  In  current 
"mailbox"  schemes  as  epitomized  by  the  VAX  11/780  system 
[DEC  773.  The  namlna  or  numbering  of  mailboxes  must  be 
known  to  the  programmer  or  a  creating  process.  This  Is  com¬ 
monly  referred  to  as  the  IPC-setup  problem*  coined  by  Flllot 
Ornanick  In  reference  CORGA  723.  The  designers  of  UNIX 
C  THOM  743  CRITC  78  3  sought  to  alleviate  this  problem.  They 
Invented  the  " o 1 p e • "  In  UNIX  a  user  program*  running  In  Its 
own  process*  may  take  the  place  of  a  file  In  a  manner  which 
Is  transparent  to  the  original  program.  Each  program  may 
have  Its  standard  Input  and  output  files  replaced  by 
programs*  thus  building  via  the  UNIX  shell  arbitrarily  long 
linear  chains  (a  pipeline)  of  programs.  UNIX  automatically 
transfers  the  data  between  processes  and  synchronizes  the 
process  as  It  Intercepts  the  standard  Input  and  output  file 
operations. 
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UNIX  "pipes"  eliminate  the  need  for  process  naming  and  treat 
concurrency*  resource  allocation*  and  inter-process  protocol 
as  a  data  flow  problem.  Interprocess  protocols  are  treated 
simply  as  simplex  data  streams.  The  job  control  language 
provided  by  the  UNIX  shell  becomes  a  pseudo  data  flow 
language  and  resource  allocation  Is  transparent  to  the 
programmer.  However*  there  are  a  considerable  number  of 
programmer  protocols  which  are  not  served  by  "pipes."  As 
acknowledged  In  reference  CRITC  783*  "pipes"  cannot  be  used 
to  construct  multi-server  subsystems. 

UNIX  will  support  general  Interprocess  communication 
protocols  but  these  are  not  generated  by  the  shell.  These 
can  be  programmed  as  a  set  of  child  processes  whose  "pipes" 
have  been  setup  by  a  parent  process. 

A  Resear^c ji  Direction 

If  we  are  to  be  successful  In  distributing  programs  across 
highly  distributed  systems*  we  must  provide  the  programmer 
of  dynamically  Interconnected  cooperating  processes  a  job 
control  language  (software  configuration  control)  as  easy  to 
use  as  Hoare's  communicating  sequential  processes.  It  seems 
that  the  most  promising  direction  Is  to  extend  the  concept 
of  the  UNIX  shell  to  automatically  generate  the  more  complex 
protocols  available  to  the  parent  processes  previously 
described.  It  must  then  also  be  extended  to  generate 
(representations  of)  distributable  configurations  of  com¬ 
municating  processes. 

Work  In  this  area  is  underway  at  Kansas  State  University. 
The  project*  Involves  development  of  a  Network  Adaptable 
Executive  (NADEX)CYOUN  793.  The  attempt  Is  to  permit  the 
user  to  specify  data  flow  at  the  command  level  and  have  the 
command  Interpreter  generate  a  distributable  software  con¬ 
figuration  of  nodes  connected  by  full  duplex  data  transfer 
stream  connections  <DTS  connections)  to  form  an  undirected 
graph.  In  general*  a  node  may  be  thought  of  as  a  process. 
Each  of  the  connections  consists  of  two  Independent  bl  - 
directional  data  transfer  streams.  One  of  these  streams 
uses  small  parameters  while  the  other  uses  a  standard-sized 
data  buffer.  The  data  buffers  carry  along  with  them  size 
and  status  Indicators  whereas  the  parameter  buffers  contain 
only  a  small  amount  of  use r- supp 1 1 ed  data. 

A  user  program  running  In  a  node  performs  serial  buffered 
READ  and  WRITE  operations  In  Its  various  connections.  The 
connections  are  numbered*  and  the  program  attaches 
particular  meanlnqs  and  Implements  particular  protocols  tor 
each  of  Its  connections.  A  connection  can  connect  a  node 
either  to  a  user  program  or  to  a  system  process  used  to  ac¬ 
cess  a  file  or  an  I/O  device.  The  program  cannot  tell  the 
difference  between  these  modes  of  operation.  This  clearly 
provides  all  of  the  power  of  the  UNIX  pipelines  while  remov¬ 
ing  the  linearity  constraint  on  the  structure  of  the  connec- 
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tlon  graph.  Also*  the  connections  are  bi-directional  so 
that*  for  pxample*  a  write-request/read-response  protocol  to 
access  a  random  file  can  be  Implemented. 

For  these  serial  buffered  READ  and  WRITE  operations*  a 
oriorl  protocol  knowledge  can  be  specified  to  the  underlying 
data  flow  Implementation  (buffer  control)  to  enable  It  to 
maintain  a  check  for  validity  of  user  protocol  (In  terms  of 
data  flow)  durlnq  execution.  This  protocol  checking  Is 
critical  In  "un-debugged"  (user-written)  nodes.  Examples  of 
such  protocol  violations  occur  many  times  in  the  facilities 
of  SOLO  C6RIN  76T.  Deadlock  detection  is  also  performed 
based  on  data  flow  in  a  configuration  which  Is  distributed 
across  machines  connected  by  a  network  IPC.  Multiserver 
subsystems*  such  as  a  data  base  management  system*  are  1m- 
plementable  as  a  configuration  with  mu 1 1 1 -connect  1  on  READ 
(multiple  condition  UAITs)  and  conditional  WRITE  operations 
provided  on  data  transfer  streams.  I nt e r conf 1 gur a 1 1  on  con¬ 
nections  are  also  provided.  Finally*  the  command 
Interpreter  and  the  node  Interface  (PREFIX)  provide  all  the 
maoplno  of  logical  data  streams  (ports)  onto  Implementation 
data  streams. 


*  Supported  In  part  by  the  Army  Research  Office  under  Grant 
Number  P-16160-A-EL. 
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SECTION  6 
THEORETICAL  WORK 


6.i  Mfla&ms  £Myp  sim  report 

STRUCTURE  of  Discussion: 

Distributed  system  without  central  (or  any)  control 
Free  ranqlng*  undirected  (no  standards) 

Principles,  not  mechanisms 

Theory,  not  formalism 

Independent  of  Technology 

Outline:  Target  drawn  around  arrows 

WHAT  IS  A  DISTRIBUTED  SYSTEM? 

A  dlilLlkwifid  S*£i£21  Is  one  1n  Which  the  communication 
of  data  between  processes  takes  a  significant  amount  of 
time  compared  to  the  time  needed  to  execute  one  step  of 
a  process. 

LJS.£iE£iL£  •  P  D  P  .  1  0 

SPECIFICATION 

(Note:  Numbers  In  parentheses  are  "pointers"  to  am¬ 

plifying  material  In  paragraph  6.2.) 

Definition:  A  specification  Is  that  which  lets  one 

decide  If  a  running  system  Is  behaving  correctly. 

State-free  Methods 

Applicative  programming  (6. 2. 1.1) 

Teletype  paradigm  (6. 2. 1.2) 

Observable  I/O  behavior  (6. 2. 1.3) 

State-based  Methods  (6. 2.1. A) 

State  graphs  (6. 2. 1.5) 

Critical  sections  (6. 2. 1.3) 

Problems 

Avoid  explicit  state  description  (6. 2. 1.6) 

How  to  specify  complex  systems  (6. 2.1.7) 


SSlAnAilfiH :  A  model  exhibits  the  properties  of  an  Im¬ 

plementation 

MODELS  CONSIDERED  (Procedures  and  Files) 

General  test  and  set  model  (6. 2. 2.1) 

Hit  transmission  model  (6. 2. 2. 2) 

Interpretive  model  (6.2.2. 3) 

OTHER  MODELS  (6. 2. 2. A) 

Actor-  Induct  ion 
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LISP 

etc. 

RELEVANCE  OF  MODELS  <6. 2. 2. 5) 

PROBLEM  AREAS  (6. 2. 2. 6) 

Existence  of  single  basic  model 


analysis 

Inferring  a  system’s  behavioral  properties 
Formal  proofs  of  correctness  (6. 2. 3.1*  6.2. 3. 2» 

6. 2. 3 .3) 

Fault  tolerance  (6. 2.3.4) 

Performance 

>  Measurements  (6. 2. 3. 5) 

Complexity 

Space  (6. 2. 3.6) 

Time  (6. 2. 3. 7) 

Data  transfer  (6. 2. 3. ft) 

Slmulatlon/emulatlon  (6. 2. 3. 9) 

Problems  (6. 2. 3.5) 

Trade-off  techniques 
Relevance  of  models 
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6.2  AMPLIFYING  MATERIAL 

* 


6.2.1  s SLiililSAllSUl 

« 

6.2. 1.1  Applicative  Programming 

Want  to  represent  a  system  as  composition  of  side-effect- 
free  functions. 

Can  extend  a  "pure"  applicative  programming  Language  with 
constructs  for  mul t 1 p roc e s s 1 ng : 

-  Suspended  evaluation  of  subexpressions. 

-  Multisets  -  unordered  collection  of  expressions 
which  becomes  ordered  as  evaluations  terminate. 


Encapsulation  of  expression  evaluations  gives  alternatives 
of  distribution  of  computlon:  factor  problem  into  assigning 
"capsules"  to  processing  nodes. 

Potential  d 1  sad va n t a ge  :  In  any  "real"  situation,  there  Is  a 
need  for  some  global  reference?  such  a  reference  cannot  be 
handled  If  side-effects  are  forbidden. 

Reference:  C  BUCK  ] 


6.?. 1.2  Teletype  Paradigm 

All  that  the  user  knows  about  a  system  is  what  goes  In  an a 
what  comes  out.  What  happens  behind  the  panels  is  of  no 
concern  to  him.  This  view  Is  captured  by  the  following 
paradigm.  There  are  N  users,  each  sitting  at  a  teletype. 
The  system  oehavior  consists  of  the  N  rolls  of  paper.  The 
correctness  of  this  behavior  must  be  decidable  Just  from 
looking  at  those  teletype  rolls. 


6.2. 1.3  Behavior  by  Interleaved  Teletype  Rolls 

If  I/O  behavior  Is  to  cc  described  In  a  way  suitable  for 
reasoning  about  composition  of  systems,  it  is  not  sufficient 
to  consider  only  the  separate  "teletype  rolls."  It  Is  pos¬ 
sible  for  two  systems  with  the  same  Individual  port  behavior 
to  he  Incorporated  as  modules  In  a  larger  system.  causing 
different  external  behavior  for  the  larqer  system.  A 
sufficiently  Inclusive  behavior  description  to  avoid  this 
problem  can  be  given  by  describing  the  iQi £r^££y£^  teletype 
rolls.  Thus  far.  such  descriptions  have  been  used  for  sim¬ 
ple  s y nc h r on  1 z a 1 1  on  and  data  base  behavior,  and  appear  to  be 
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quite  natural  and  usable. 


6.2. 1.4  State-based  methods 


A  state-based  specification  method  was  used  for  the  al¬ 
gorithms  In  [BURN  783.  There  the  appropriate  mutual  ex¬ 
clusion  behavior  was  expressed  by  grouping  process  states 
Into  "regions"  comprising  critical  states*  other  program 
states*  and  protocol  states.  Desired  exclusion*  deadlock- 
free  and  fairness  behavior  was  then  described  In  terms  of 
the  proqress  of  processes  through  their  regions.  Such 
description  led  to  clean  formal  reasoning  about  the  proces¬ 
ses.  The  description*  however*  does  not  appear  to  be  very 
easily  suited  for  reasoning  about  the  system  as  a  bulldinq 
block  for  larger  systems. 


6. 2. 1.5  State  Graphs 

Thiagarajan  has  used  the  global  state  model  to  give  a  simple 
definition  of  Shapiro's  algorithm  for  the  maintenance  of 
redundant  data  bases  In  a  distributed  environment.  This 
permits  an  elegant  and  simple  proof  of  correctness. 


6.2. 1.6  Jellybean  Example 


There  are  examples  of  simple  systems  in  which  one  cannot 
talk  about  the  state  of  the  system  at  any  particular  point 
In  time.  The  example  Involves  two  processes  modifying  the 
number  of  jellybeans  in  a  factory*  and  one  process  counting 
the  total  number  of  jellybeans.  The  behavior  of  these  three 
operations  cannot  oe  explained  by  any  sequential  ordering  of 
their  executions.  How  can  we  specify  correctness  of  this 
system  in  a  sufficiently  general  way  to  allow  this  type  of 
Implementation? 

Reference:  CLAMP  763. 


6. 2. 1.7  How  to  Specify  Complex  Systems 


We  are  faced  with  a  dilemma.  We  do  not  want  to  have  to  men¬ 
tion  states  in  our  specification.  Put  It  is  very  difficult 
to  write  any  non-trivlal  specification  without  talking  about 
states.  For  example*  try  specifying  a  memory  cell  without 
talking  about  states. 
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6.2.2  Models 


6. 2. 2.1  The  Test-and-Set  Model  of  IPC 

The  Test-and-Set  primitive  Is  a  powerful  indivisible  opera¬ 
tion  for  accessing  a  shared  variable  for  communication  among 
asynchronous  processes.  The  model  treats  asynchronous 
operation  by  considering  timing  sequences.  Correct  al¬ 
gorithms  must  work  for  all  timing  sequences.  Fairness 
properties  may  require  that  the  timing  sequences  be  restric¬ 
ted  to  those  satisfying  "finite  delay."  A  sequence  satis¬ 
fies  finite  delay  If  no  process  has  to  wait  forever  for  a 
timing  message. 

The  Test-and-Set  primitive  Is  In  one  sense  the  most  powerful 
primitive  possible.  Hence.  the  lower  bounds  results  for 
this  model  apply  directly  to  all  weaker  primitives. 

To  model  general  distributed  systems.  it  is  necessary  to 
model  processes  and  s 1  on  1 f 1 c an t -d i s t anc e  communication.  To 
model  a  message  channel  In  the  simplest  and  most  natural 
way.  we  think  of  It  as  a  special  type  of  process  with  access 
to  two  variables,  one  at  each  of  its  ends.  The  process  sim¬ 
ply  reads  the  contents  of  one  of  the  variables  and  writes 
the  result  in  the  other  variable,  ad  infinitum.  Ue  Imagine 
this  process  to  be  asynchronous  with  respect  to  the  other 
processes  In  the  system.  Thus  communication  delays  are  as¬ 
sumed  to  be  arbitrary.  This  model  seems  simple  and  general 
enough  to  provide  a  basis  for  simulating  and  comparing 
distributed  systems  of  practically  any  type. 


6.2.2.?  Bit  Transmission  Model 

Lamport  favors  a  more  low-level  IPC  model:  transmission  of 
1  bit  of  information  from  one  process  to  another.  Requires 
a  l  bit  storage  device  which  can  be  written  by  process  A  and 
concurrently  read  by  process  B.  tJon-trivlal  to  Implement  on 
atomic  register  which  acts  as  If  reads  and  writes  are  total¬ 
ly  ordered.  Some  results  are  In  CLAMP  773.  others  are  un¬ 
pub  1 1  shed . 


6. 2. 2. 3  SS  Model 

The  applicative  technique  uses  an  interpretive  languaae  to 
describe  a  distributed  system.  An  interpreter  for  ap¬ 
plicative  language  may  then  serve  to  model  system  behavior. 
The  unordered  evaluation  of  expressions  In  a  multiset 
becomes  Implemented  as  a  scheduler.  Communication  may  be 
modeled  in  terms  of  the  elapsed  simulated  time  associated 
with  each  parameter  passing  operation. 
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6. 2. 2. 4  Other  Model* 

Certain  models*  although  significant*  Tailed  to  receive  at¬ 
tention  due  to  the  lack  of  advocates  In  the  group*  Most 
notable  were  the  Ac t o r- I nduc 1 1  on  Model  of  Carl  Hewitt  and 
Pptrl  Nets. 


6. 2. 2*5  Relevance  of  Models 

Models  of  distributed  systems  are  abstractions  of  real  or 
hypothetical  systems.  The  relevance  of  any  abstraction 
depends  strongly  on  Its  Intended  application  --  the  abstrac¬ 
tion  should  preserve  the  Important  features  of  the  situation 
beina  modelled  and  discard  the  unimportant.  Models  reflect¬ 
ing  details  of  current  technology  are  appropriate  for  under¬ 
standing  present-day  distributed  systems  but  they  become 
quickly  obsolete  as  the  technology  shifts.  Models  attempt¬ 
ing  to  capture  the  universal  constraints  on  any  system  Im¬ 
posed  by  basic  laws  of  physics  are  more  fundamental*  but 
evaluating  their  relevance  to  dlqltal  systems  requires  a 
considerable  understanlng  of  electronics  and  physics*  and 
they  will  likely  be  too  primitive  and  detailed  to  shed  much 
lloht  on  hlqher-level  Issues  such  as  those  alscussed  el¬ 
sewhere  In  this  report. 

For  example*  most  models  of  parallel  systems  Include  some 
sort  of  synchronl zat Ion  primitive  whether  It  be  P  and  V* 
monitors*  message-passlnn*  or  whatever*  and  most  practical 
systems  have  hardware  which  Implements  these  primitives 
satlsfactorlally.  However*  the  glitch  problem  aparently 
prevents  the  construction  of  a  perfect  arbiter  (as  oppsed  to 
one  which  Is  satisfactory  because  Its  probability  of  failure 
Is  Infinitesimally  small)*  so  any  physical  realization  of  an 
arbiter  has  a  possibility  of  failure  throuoh  Infinite  delay. 
The  test-and-set  model  and  the  1  —  b  1 1  transmission  model  can 
both  describe  oerfect  arbiters  and  so  both  must  be 
considered  only  approximations  to  reality.  Uhlle  test-and- 
sets  seem  at  first  sight  to  be  far  from  primitive*  they 
encompass  operations  such  as  read*  write*  increment  memory* 
etc.  which  might  or  might  not  be  atomic  In  a  given  system* 
so  lower  bounds  on  complexity  apply  to  all  such  weaker 
models.  The  fact  that  a  fair  arbiter  Is  needed  for  a  hard¬ 
ware  realization  of  the  model  does  not  detract  from  Its 
usefulness  In  describing  solutions  to  the  critical  section 
problem,  for  building  critical  section  solutions  with  strong 
fairness  properties  (bounded-waiting,  FIFO)  from  arbiters 
only  known  to  be  free  from  lockout  is  a  non-trlvlal  task. 
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6. 2. 2*6  Problem  Areas 

Although  a  number  of  models  were  proposed  for  Interprocess 
communication*  we  observed  that  there  was  no  "basic  unit"  by 
means  of  which  all  of  them  could  be  implemented.  Identify¬ 
ing  such  a  basic  unit  would  give  a  uniform  scale  for  compar¬ 
ing  different  communication  mechanisms. 


6.2.3  AQAl ISll 


6.2.3. 1  State  Graph  Analysis 


See  6 . 2  •  1  .  b 


6. 2. 3. 2  Critical  Region  Algorithm  Proof 

A  formal  proof  has  been  developed  for  one  of  the  mutual  ex¬ 
clusion  algorithms  given  in  [BURN  783.  Although  the  proof 
follows  the  general  format  of  i nva r 1  a n t -a s se r 1 1  on  proofs* 
the  major  ideas  in  the  parts  of  the  proof  that  deal  with 
fairness  are  contained  in  precisely-stated  lemmas  which 
mirror  natural  intuitive  understanding  of  the  algorithms. 
The  parts  of  the  proof  that  deal  with  reachability  of  states 
have  a  less  intuitive  and  more  case-analytic  flavor.  A 
current  effort  is  to  decompose  the  invariants  In  a  way  that 
will  allow  reachability  properties  also  to  be  verifier  in  a 
way  that  accords  intuition. 


6. 2. 3. 3  Global  Assertions 

There  are  well-developed  techniques  for  proving  correctness 
properties  of  non-distributed  multiprocess  programs.  Lam¬ 
port  used  to  feel  that  they  were  not  good  tor  distributed 
systems  because  <l)  they  used  qlobal  assertions  which  imply 
a  global  system  state*  which  is  undesirable  (see  6. 2.1.6)* 
and  (2)  they  require  that  communication  arcs  be  represented 
by  processes*  which  means  lots  of  processes.  However*  he 
has  recently  discovered  that  these  techniques  do  work  well* 
since  <1)  there  seem  to  be  a  class  of  "good"  global  as¬ 
sertions*  and  (2)  you  have  to  specify  the  communication  arcs 
very  carefully  anyway. 
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6. 2. 3. A  Fault  Tolerance 

Ue  consider  two  types  of  failure:  unannounced  halting 
(sleeping)  and  announced  shutdown  (dying)*  Peterson  and 
Fischer  C°ETE  77]  and  Rivest  and  Pratt  CRIVE  76]  give 
critical  section  algorithms  In  a  shared-variable  read/write 
model  that  are  Immune  to  process  dying*  i.e.»  the  remaining 
processes  continue  correct  operation. 

Performance  and  tolerance  to  failure  by  sleeping  are  closely 
related.  If  one  process  can  be  hung  up  forever  because  it 
is  waiting  for  a  failed  process,  then  its  performance  will 
be  deqraded  by  a  non-failed  process  that  is  simply  running 
very  slowly. 

Ue  have  algorithms  for  the  test-and-set  model  solving  the 
k-critical  section  problem  which  in  a  sense  have  k  Indepen¬ 
dent  paths  to  the  critical  section.  That  is.  even  If  k  -  1 
processes  fail,  the  other  processes  will  not  be  waiting  on 
them  and  will  continue  operating  and  gaining  access  to  the 
remaining  resources. 


6. 2. 3. 5  Measurements 

The  traditional  measures  of  "time"  and  "space"  do  not  form 
an  adequate  framework  for  assessing  the  complexity  of 
distributed  computations.  In  order  to  understand  the  "cost" 
of  it  distributed  computation,  we  need  to  enlarge  and  refine 
our  collection  of  cost  measures.  For  example,  "time"  may 
refer  to  total  time  or  time  measured  at  an  individual  site. 
Similarly  "spice"  could  refer  to  either  the  size  of  the 
total  system,  or  the  size  of  Individual  sites.  In  addition 
to  the  "time"  and  "space"  required  to  perform  a  computation, 
we  should  also  consider  the  "amount  of  interprocess  com¬ 
munication."  both  the  total  traffic  flow  over  the  whole 
system.  and  the  bandwidth  requirements  of  individual  chan¬ 
nels. 

In  analyzing  sequential  processes,  we  are  used  to  thinking 
In  terms  of  time-space  tradeoffs.  Are  there  analogous 
tradeoffs  for  distributed  systems?  For  example.  one  can 
usually  get  by  with  smaller  individual  processors  if  one  is 
willing  to  have  more  processors.  and  consequently.  more 
interprocessor  communication.  Can  this  tradeoff  of 
Interprocess  communication  vs.  complexity  of  individual 
process  be  made  precise?  Again,  one  usually  has  the  choice 
of  either  implementing  shared  global  resources  or  duplicat¬ 
ing  these.  resources  at  different  sites.  Are  there 
guidelines  for  deciding  which  of  these  strategies  to  pursue? 
In  general,  we  need  to  deal  with  the  following  sorts  of 
questions:  (1)  What  are  the  characteristics  of  those 
problems  which  allow  one  to  make  effective  use  of 
distributed  computation?  til)  Conversely,  can  we  learn  to 
recoqnize  problems  whose  solution  would  require  such  large 
amounts  of  internrocessor  communication  as  to  render  these 
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problems  Inherently  unsuited  for  solution  In  a  distributed 
manner?  (Ill)  Can  we  Identify  techniques  for  tailoring 
distributed  architectures  to  the  solution  of  particular  com¬ 
putational  problems?  (1v)  Can  we  formulate  a  theory  which 
combines  concerns  for  time-space  complexity  with  concerns 
for  minimizing  Interprocess  communication*  thus  providing  an 
adequate  framework  for  assessing  the  complexity  of 
distributed  computations. 


6. 2. 3. 6  Space  Complexity  for  IPC 

In  measuring  space  complexity  for  IPC*  the  shared  variable 
models  provide  a  natural  measure  -  simply  the  number  of 
states  necessary  In  the  shared  variables.  Tight  upper  and 
lower  bounds  on  the  communication  space  required  have  been 
demonstrated  for  certain  synchronization  problems  using  the 
Test-and-Set  model.  Additional  bounds  are  anticipated  for 
other  problems  and  primitives. 

Reference:  [BURN  7»T 


6. 2. 3. 7  Time  Complexity  Measures  for  IPC 

A  great  deal  of  work  has  been  done  In  the  time  complexity  of 
sequential  algorithms.  Synchronous  parallel  computations 
commonly  use  a  "tree  depth"  measuere  for  the  time  com¬ 
plexity.  These  techniques  do  not  extend  easily  to  asynch¬ 
ronous  parallel  processing  because  there  Is  no  direct 
measure  of  global  time  directly  derivable  from  the  steps  of 
the  Individual  processes.  For  example*  If  any  process 
reaches  a  state  where  It  must  wait  for  communication  from 
another  process*  It  may  take  an  unbounded  number  of  steps 
before  the  remainder  of  the  system  changes  state.  Since  a 
simple  sum  of  all  processor  steps  would  often  give  unbounded 
lower  bounds  for  many  problems*  (and  hence  are 

uninteresting)*  new  measures  are  needed.  Current  work  is 
proceeding  examining  time  bounds  of  test-and-set  algorithms 
using  the  following  types  of  bounds. 

1)  Count  the  total  number  of  "transitions" 
between  two  events  of  Interest. 

2)  Count  the  number  of  transitions  of  a 

particular  process  between  two  events. 

3)  Count  the  total  number  of  transitions  between 
two  events  divided  by  the  number  of  processes 
Involved. 

(A  "transition"  Is  a  step  of  a  process  which  causes  a  change 
In  the  shared  variable)  Each  of  these  bounds  appears  to  be 
of  Interest. 
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6. 2. 3. 8  Oata  Transfer  Performance 

Abelson  CABEL  78]  has  recently  developed  techniques  for 
provlnq  Inherent  tower  bounds  on  the  amount  of  Interprocess 
communication  required  for  performing  computations  In  a 
distributed  system.  Using  these  techniques*  he  has  analyzed 
distributed  systems  which  perform  matrix  operations  and 
solve  systems  of  linear  equations.  His  work  shows  that* 
from  the  point  of  view  of  minimizing  communication*  the  ob¬ 
vious  techniques  are  optimal. 


6.2.3.?  Performance  Results 

An  alternative  (perhaps  a  copout)  to  formal  analysis  is  to 
use  a  simulation  or  emulation.  This*  however*  Is  not  an 
entirely  straightforward  proposition.  First*  a  suitably  ac¬ 
curate  description  of  the  distributed  system  must  be  derived 
and  second*  the  a r 1 1 f 1 c 1  a 1 1 1 1 es  of  the  simulation/emulation 
must  be  factored  out. 
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6.3  miiui  EAE&aa 


6.3.1  Abq Lson 

Theoretical  Issues  In  Distributed  Coeputatlon 

by 

Harold  Abelton 
MIT 

Current  research  In  the  area  of  distributed  computation 
focuses  almost  exclusively  on  algorithms  and  systems.  while 
the  problem  of  determining  the  Inherent  complexity  of 
distributed  computations  remains  virtually  unexplored. 
Moreover.  most  theoretical  work  In  the  area  of  parallel 
processlna  relies  on  a  model  of  computation  In  which  all 
processors  have  ready  access  to  all  memory  registers  ---  an 
assumption  which  Is  unrealistic  when  dealing  with 
distributed  computations.  For  example,  although  the  solu¬ 
tion  of  n  linear  equations  In  n  unknowns  can  be  accomplished 
In  order  (log  n)**2  steps  if  one  Ignores  Information  trans¬ 
fer.  It  can  be  shown  that,  for  typical  Interconnection  con¬ 
figurations  among  n  processors  the  Interprocessor  data 
transfers  alone  require  on  the  order  of  n  steps. 

We  need  to  address  directly  the  problem  of  Interprocessor 
data  transfer  and  to  establish  bounds  on  the  amount  of  com¬ 
munication  required  for  a  wide  variety  of  problems  In  a  wide 
variety  of  distributed  arch  1  tec t ures .  In  general,  we  need 
to  deal  with  the  following  sorts  of  questions:  (1>  What  are 
the  characteristics  of  those  problems  which  allow  one  to 
make  effective  use  of  distributed  computation?  (11)  Conver¬ 
sely.  can  we  learn  to  recognize  problems  whose  solution 
would  require  such  large  amounts  of  Interprocessor  com¬ 
munication  as  to  render  these  problems  Inherently  unsuited 
for  solution  In  a  distributed  manner?  (Ill)  Can  we  Identify 
techniques  for  tailoring  distributed  architectures  to  the 
solution  of  particular  computat lonal  problems?  (1v)  Can  we 
formulate  a  theory  which  combines  concerns  for  time-space 
complexity  with  concerns  for  minimizing  Interprocess  com¬ 
munication.  thus  providing  an  adequate  framework  for  asses¬ 
sing  the  complexity  of  distributed  computations. 
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6.3.2  £lilHS£ 

Time  Complexity  of  Distributed  Computations 

by 

Michael  J.  Fischer 
University  of  Washington 

A  fundamental  question  In  the  theory  of  distributed  comput¬ 
ing  Is  how  well  a  particular  system  does  Its  job.  To 
determine  this.  one  needs  a  specification  of  the  job  and  a 
means  of  comparlna  the  efficiency  of  the  given  system  with 
other  candidate  systems. 

Three  aspects  of  distributed  systems  complicate  considerably 
the  specification  of  the  desired  behavior.  First  of  all. 
non- te rm 1 na t 1 ng  computations  tend  to  be  the  rule  rather  than 
the  exception.  so  Infinite  execution  sequences  must  be 
described.  Secondly,  because  of  varlabllty  In  the  relative 
speeds  of  the  different  processes,  the  system  Is  Inherently 
non-de te rm 1 n 1 s t 1 c .  While  determinate  behavior  Is 
nonetheless  possible.  It  may  not  be  required.  so  the 
spec  1 f 1  cat  1  on  must  allow  for  varlabllty  In  the  observed 
behavior.  Finally,  the  Inputs  and  outputs  of  a  distributed 
system  may  be  dispersed  over  a  number  of  sites,  and  the  com¬ 
munication  aspects  of  the  problem  need  to  be  captured  In  a 
natural  way. 

Flndlnq  a  satisfactory  time  measure  for  distributed  systems 
Is  much  more  difficult  than  for  sequential  programs.  In  the 
latter  case.  elapsed  time  Is  just  the  sum  of  the  times  of 
the  basic  Instructions.  With  parallel  computations,  certain 
steDs  may  execute  concurrently,  so  the  simple  linear  depen¬ 
dence  of  elapsed  time  on  the  Instruction  speed  is  lost.  For 
this  reason.  It  becomes  attractive  to  look  Instead  at  the 
dependencies  between  steps  of  various  processes  rather  than 
at  elapsed  time.  When  these  dependencies  are  represented  as 
a  partial  order,  the  longest  path  through  the  order  gives  a 
natural  measure  that  reflects  the  time  necessary.  assuming 
maximum  concurrency. 

Once  we  have  a  satisfactory  notion  of  the  execution  time  for 
a  particular  interleaved  sequence  of  steps.  It  Is  still  not 
clear  how  to  base  a  comparative  analysis  of  systems  on  this 
Information.  for  different  systems  solving  the  same  problem 
will  not  necessarily  exhibit  the  same  Interleavings.  What 
Is  needed  Is  a  set  of  parameters  common  to  all  solution 
systems  In  terms  of  which  the  time  can  be  exoressed. 

Finally,  the  relative  efficiency  of  a  system  may  depend 
stronolv  on  whether  one  Is  Interested  in  some  notion  of 
total  system  throughput  or  In  resoonse  time  at  a  olven  site 
(or  In  some  other  quantity). 
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6.3.3  klAAALl 


Theory  and  Formalism 
by 

L.  Lamport 
SRI  International 

Formal  methods  are  needed  to  specify  and  prove  the  correct¬ 
ness  of  distributed  systems.  The  primary  requirement  for  a 
specification  Is  that  It  be  understandable  by  humans*  since 
only  a  human  can  determine  the  correctness  of  a 
specification.  “oreover*  a  specification  involving  program 
variables  does  not  meet  this  criterion*  since  program 
variables  are  part  of  the  solution*  and  are  of  no  concern  to 
the  user.  There  has  been  very  little  progress  In  this  area. 
It  is  rare  to  find  even  a  precise  Informal  statement  of  what 
a  simple  distributed  algorithm  Is  supposed  to  do  --  let 
alone  a  specification  of  an  entire  system. 

A  formal  specification  Is  useful  only  If  there  Is  some 
formal  method  for  deciding  If  a  system  meets  its 
specification.  Currently*  there  exist  formal  methods  for 
proving  properties  of  non-d 1 st r 1  but ed  multiprocess  systems. 
We  need  to  discover  how  these  methods  can  be  extended  to 
distributed  systems*  or  else  develop  new  methods.  There  has 
been  some  progress  In  this  area*  but  we  are  very  far  from 
belnq  able  to  handle  real*  complex  systems. 

I  feel  that  In  order  to  make  progress  In  these  areas*  It  Is 
necessary  to  be  able  to  deal  formally  with  non-atomlc 
operations  --  to  describe  the  system  as  a  collection  of 
operations  which  do  not  act  as  If  they  were  executed  In  any 
sequential  order.  I  have  some  vaque*  preliminary  Ideas  on 
how  this  can  be  done. 


6.3.4  Lynch 

Complexity  Theory  of  Distributed  Systems 

by 

Nancy  Lynch 

Georgia  Institute  of  Technology 

Most  of  the  current  work  In  theory  of  distributed  systems 
seems  to  me  to  focus  on  a  rather  high  level  of  programming. 
Virtual  machines  and  networks*  Hoare-style  communication 
mechanisms  which  combine  powerful  s y nc h r on  1 z a 1 1  on  and  value¬ 
passing  behavior*  related  mechanisms  which  assume  preserva¬ 
tion  of  unbounded  numbers  of  messages*  serializers*  abstract 
data  types  with  "nonatomlc"  elements*  etc.  are  all  user- 
oriented  abstractions  which  allow  logical  organization  of 
complex  algorithmic  behavior  without  concern  for  troublesome 
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Implementation  detail.  Unfortunately*  there  are  good 
reasons  why  such  detail  cannot  entirely  be  suppressed. 
Efficiency  of  operation  of  a  distributed  system  Is  of 
paramount  concern  to  the  user.  There  are  so  many  more  pos¬ 
sible  variations  in  implementation  in  a  distributed  en¬ 
vironment  than  in  more  traditional  computing  environments 
that  knowledge  of  the  implementation  method  cannot  help  but 
influence  the  user’s  program  design;  indeed*  some  such 
knowledge  ,  is  probably  necessary  for  even  acceptably 
efficient  use  of  the  system. 

It  is  important  to  complement  high-level  theoretical  and 
language-design  work  with  a  firmly-based  theory  of  lower- 
level  distributed  programming*  geared  particularly  to 
measurement  of  the  efficiency  of  performance.  Very  simple 
and  general  primitives  such  as  shared  variables  and  one-way 
a r b 1 t r a r y -d e l a y  communication  channels  should  be  used  as  a 
general  basis  for  such  a  theory.  Various  appropriate 
measures  ot  resource  use  and  performance  (e.g.«  communica¬ 
tion  "bandwidth"*  total  number  of  changes  to  variables  that 
occur*  total  "depth"  of  the  computation)  can  then  be  defined 
precisely.  Then  the  costs  of  implementing  the  various  high- 
level  mechanisms  mentioned  above  can  be  assessed  objectively 
and  compared.  While  the  user  might  not  need  to  know  precise 
implementation  details*  he  would  at  least  benefit  from 
knowledge  of  these  costs  in  resource  use*  for  the  various 
available  mechanisms. 

As  for  sequential  computing*  the  theory  of  distributed 
systems  will  not  ultimately  be  concerned  with  implementation 
of  different  system  primitives*  but  with  efficient  fulfill¬ 
ment  of  application  requirements.  Thus*  the  theory  can  be 
expected  to  focus  on  design  and  analysis  of  systems  exhibit¬ 
ing  certain  desired  behavior*  in  application  areas  suitable 
for  distributed  computing  (e.g.»  load-sharing*  multiple  use 
of  databases*  mail  communication*  synchronization).  A  low- 
level  model  and  elementary  complexity  measures  such  as  those 
described  will  form  a  useful  basis  for  such  analysis*  with 
higher-level  constructs  used  along  the  way.  Also  important 
for  such  a  theory  will  be  the  development  of  reasonably 
consistent  means  of  specifying  desirable  behaviors  for 
systems.  Such  behaviors  might  involve  the  input-output 
interface  of  a  system  or  the  internal  state  behavior  of 
processes. 

A  prototypical  development  has  been  carried  out  (jointly 
with  Michael  J.  Fischer  and  graduate  students  J.  Burns*  P. 
Jackson*  and  G.  peterson>  for  simple  mutual  exclusion 
behavior.  Further  work  is  currently  in  progress. 
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6.3.5  siallai 


Theory  and  Fornallsn 
by 

Stephen  W.  Smollar 

Conventional  modes  of  programming  and  alaorlthmlc  specifica¬ 
tion  have  many  potential  shortcomings  In  the  design  and  im¬ 
plementation  of  distributed  systems.  In  his  1977  ACM  Turlna 
Award  Lecture.  John  Backus  cited  seven  "Inherent  defects  at 
the  most  basic  level"  In  traditional  programming  languages! 
"their  primitive  wo r d-a t -a - 1 1  me  style  of  programming  in¬ 
herited  from  their  common  ancestoi - the  von  Neumann  com¬ 
puter.  their  close  coupling  of  semantics  to  state 
transitions.  their  division  of  programming  Into  a  world  of 
expressions  ana  a  world  of  statements.  their  Inability  to 
effectively  use  powerful  combining  forms  for  bulldlna  new 
programs  from  existing  ones.  and  their  lack  of  useful 
mathematical  properties  for  reasoning  about  proarums."  Un¬ 
fortunately.  a  good  deal  of  thinking  about  distributed 
systems  has  become  bogged  down  precisely  because  of  a 
preconceived  commitment  to  these  same  Inherent  defects. 

A  fruitful  alternative  Is  the  functional  style  of  ap¬ 
plicative  programming.  The  central  idea  Is  that  all 
programs  are  expressed  as  functions.  The  coupling  of  a 
function  with  Its  arguments  constitutes  an  expression,  and  a 
process  i_s  .that.  co|£utat.j.ona^  a£t_^ y i. t.^  i_nv o l.ye d  jin  .the 
fiYUiUUliUIl  of  an  expression.  The  most  important  aspect  of 
this  approach  is  that  it  has  eliminated  the  need  for  the  as¬ 
signment  statement,  since  the  only  allowable  assignments  are 
parameter  bindings.  Recursive  composition  of  functions 
eliminates  the  need  for  loops  (and  with  it  many  of  the 
concerns  of  structured  programming).  Finally.  Input/output 
functions  may  be  transcended  by  a  view  of  files  as  arguments 
and  values  of  expressions. 

Multiprogramming  concepts  may  be  best  expressed  in  ap¬ 
plicative  terms  by  Introducing  a  data  structure  known  as  a 
EUi.iiS.fii*  4  multiset  may  pe  viewed  as  an  unordered  collec¬ 
tion  of  expressions  whose  evaluations  may  proceed  In  paral¬ 
lel.  Retrieval  of  data  from  a  multiset  Is  contingent  upon 
termination  (also  known  us  £fiQy.££a£D££  >  of  at  least  one 
evaluation  process?  and  retrieval  effectively  transforms  a 
multiset  from  an  unordered  collection  of  expressions  Into  an 
ordered  seauence  of  values.  Furthermore,  multisets  may  be 
constructed  through  multiple  applications  of  the  same  func¬ 
tion  to  each  of  the  elements  of  an  already-constructed  mul¬ 
tiset.  Finally,  the  conventional  conditional  exoresslon  may 
be  extended  to  control  whether  or  not  an  evaluation  process 
ever  converges!  If  the  predicate  of  a  £££££££  conditional 
Is  not  true.  then  the  evaluation  process  automatically 
diverges. 
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It  is  thus  possible  to  formulate  algorithms  for  distributed 
systems  In  terms  of  a  rather  simple  applicative  language. 
In  fact*  the  applicative  language  provides  a  very  powerful 
tool  for  the  study  of  distributed  systems?  this  tool  is  the 
language’s  ini. £ r •  Such  an  interpreter  must  know  how 
to  implement  the  evaluation  of  expressions?  but*  more  Im¬ 
portantly*  its  definition  must  include  a  protocol  for  how 
multisets  are  constructed  and  how  their  elements  are 
evaluated.  This  protocol  may  be  instrumented  to  reflect  the 
behavior  of  a  real-time  environment.  The  interpreter  thus 
provided  a  basis  for  simulation  experiments  within  which  one 
may  Investigate  how  multiple  processors  may  be  profitably 
applied  to  multiset  1 n t e r p r e t a 1 1  on . 


Georoia  Institute  of  Technology 


I  PC  Work  shoo 


Section  7 


CURRENT  TECHNIQUES  AND  EXPERIENCE 


Page  71 


SECTION  7 

CURRENT  TECHNIQUES  AND  EXPERIENCE 


7.1  A  PROCESS  &A&&4  CflflEUIEB  aifiltfl 


An  Infornal  Paper 
by 

Ed  Basart 

Hewlett-Packard  Company 

Processes  are  the  basic  entity  in  our  computer  system.  When 
a  program  runs,  it  exists  as  a  process*  and  gives  a  program 
the  illusion  that  it  has  its  own  private  processor.  The 
system  is  then  constructed  to  support  processes  effectively 
by  making  process  communication  and  switching  efficient  and 
inexpensive.  As  a  consequence*  multiple  processors  can  be 
used  to  Increase  the  parallelism  of  the  processes  running  in 
the  system. 

The  advantages  of  such  a  computer  system  are  program 
modularity*  Increased  performance  through  parallelism* 
growth  by  adding  processors*  and  physical  d 1 s t r  1  bu t ab  1  1 1 1 y 
of  functions.  Processes  are  used  as  the  single  "object" 
that  unifies  operating  system  services  and  resources.  The 
operating  system  exists  as  a  collection  of  processes*  and 
process  primitives  are  used  as  the  kernel  of  the  operating 
system. 

Processes  communicate  using  queues  and  the  send  and  receive 
primitives.  Multiple  queue  writers  are  permitted*  while 
only  a  single  queue  reader  is  allowed.  Send  and  receive 
handle  the  details  of  the  path  between  processes  for  any  ar¬ 
bitrary  hardware  conf 1 gur at  1  on  of  processors.  This  includes 
providing  mutual  exclusion  for  processors  sharing  memory  and 
invoking  data  communication  drivers  In  systems  not  sharing 
memory.  The  data  communications  processes  resolve  the  con¬ 
nection  between  processors*  whether  the  connection  is  a  high 
speed  bus*  through  telephone  lines*  or  an  Indirect  path 
through  more  than  one  processor. 

In  order  to  send  a  message  to  another  process*  the  sendino 
process  must  first  establish  a  link  to  a  receivlna  process 
queue.  Links  are  made  by  the  file  system.  Opening  a  link 
Is  very  much  like  opening  a  disc  file.  Capabilities  and  ac¬ 
cess  riahts  to  queues  are  checked  at  open  time  by  the  file 
system*  which  eliminates  message  verification  for  the  send 
and  receive  primitives*  and  also  for  the  communicating 
processes. 
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After  a  Link  Is  open*  the  sending  process  sends  a  message  to 
a  receiving  process  by  specifying  a  Link  number*  along  with 
the  data.  The  receiving  process  reads  Its  queue  by  specify- 
Ina  Its  queue  number  and  Issulnq  a  receive.  The  receiving 
process  creates  a  queue  Initially  by  asking  the  file  system 
to  allocate  space  for  the  queue  and  grant  the  receiver 
"queue"  access.  Linking  a  sending  and  a  receiving  process 
establishes  half  duplex  communication.  Full  duplex  com¬ 
munication  may  be  established  by  creating  another  queue  and 
opening  another  link  In  the  opposite  direction  between  the 
two  processes. 

As  the  file  system  opens  a  link*  It  determines  whether  the 
two  processes  are  residinq  on  different  computers.  If  so* 
the  address  placed  in  the  link  is  that  of  a  surrogate 
process*  a  data  communications  driver  that  handles  the 
details  of  the  communication  line.  At  the  other  end  of  the 
line  Is  another  surrogate  data  communications  process.  This 
process  has  a  link  pointing  to  the  receiving  process  queue. 
This  mechanism  allows  uniform  process  communication  for  both 
local  and  remote  processes. 

Creating  a  single  queue  for  multiple  writers  seems  to  be  a 
mixed  blessing.  One  advantage  Is  that  the  system  makes  a 
single  space  allocation  for  the  queue*  and  no  new  al¬ 
locations  need  to  be  made  for  each  writer.  Another  ad¬ 
vantage  is  that  the  reader  ooes  to  only  one  location  to  read 
messages.  This  is  particularly  Important  when  the  writers 
and  reader  exists  on  different  computers. 

The  disadvantage  of  a  single  queue  is  that  a  "mad"  writer 
can  clog  the  queue.  There  are  two  solutions  to  this 
problem.  The  system  can  be  made  cognizant  of  a  writer’s 
"message  rate*"  and  a  process  can  be  given  lower  execution 
orioMty  If  its  rate  becomes  too  high.  The  other  solution 
is  to  maintain  a  message  count  for  each  writer.  The  reader 
then  decrements  the  count  as  the  queue  Is  read. 

Neither  of  these  solutions  is  very  attractive.  They  both 
suoqest  high  cost  to  protect  against  the  mad  writer.  For 
the  oresent  the  accroach  is  to  make  queues  large  enough  to 
absorb  an  initial  outburst  from  the  writer.  The  reader  is 
given  a  " break  link"  function  that  disallows  any  further 
messages  from  a  particular  writer.  This  forces  detection  of 
the  problem  on  the  communicating  processes  while  relieving 
t  h  p  send  and  receive  primitives  of  an  added  complication. 

Thr»e  similar  computer  systems  have  been  Influential  In  the 
design  of  our  system.  They  are:  1)  the  Tandem  16  computer 
system  manufactured  in  Cupertino*  California*  2 )  the  Pemos 
oorratinq  system  for  the  Cray-1  computer  at  Los  Alamos*  New 
v  p  x i c  o .  and  7)  the  Thoth  operating  system  developed  at  the 
University  of  Waterloo*  Ontario. 
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Our  system  has  two  primary  differences  from  the  mentioned 
ones.  The  first  Is  In  handling  all  types  of  physical 
processor  Interconnections  at  the  primitive  level*  rather 
than  doing  It  In  the  operating  system.  The  second  Is  In 
making  much  greater  use  of  processes  and  messages.  All  of 
the  above  systems  break  away  from  their  message  systems  for 
certain  types  of  functions  that  are  considered  to  be  too  ex¬ 
pensive  to  be  done  in  a  message  system. 
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7.2  i££  in  tm&M&atEm  aisisiamLa  BflflEuiEB  mm 

HETEROGENEOUS  DISTRIBUTED  COMPUTER  NETWORKS 
ANO  INTERPROCESS  COMMUNICATION  THEREIN 

by 

J.  S.  Sventek 

Lawrence  Berkeley  Laboratory 


7.2.1  lniiadmil&fl 

The  primary  focus  of  the  Advanced  Systems  Croup  In  C  S  A  M  Is 
the  question  of  distributed  processing  In  a  network  consist¬ 
ing  of  hosts  with  vastly  differing  architectures.  Our  main 
q  o  a  l  *  at  this  point  In  time*  is  to  provide  a  distributed  en¬ 
vironment  which  Is  easily  used  by  people  with  very  diverse 
needs!  for  example: 

1)  a  research  group  developing  a  distributed 
relational  database  system 

2)  administrative  personnel  maintaining  current 
accounting  databases 

7)  graphics  researchers  exploring  new  and  novel 
representations 

4)  high  enerqy  physicists  designing  systems  to 
collect  and  sample  on-line  vast  quantities  of 
experimental  data 


In  order  to  achieve  the  aoal  of  easy  use*  we  are  somewhat 
less  concerned  with  "efficiency"  issues  than  with  merely 
making  the  system  functional.  From  empirical  studies  of  a 
working  system*  we  hope  to  discern  the  "Inefficient"  aspects 
of  the  system*  and  may  devise  algorithms  to  alleviate  the 
problems.  Ffflclency*  in  this  context*  Is  only  concerned 
with  throuahput. 

Two  entitles  must  exist  before  an  easily  used  distributed 
system  can  be  realized: 

1)  a  common  shell  (command  line  Interpreter). 

It  Is  of  somewhat  limited  utility  to  provide 
virtual  terminal  capabilities  on  the  hosts  In 
the  network  If  the  user  must  learn  a 
different  language  to  communicate  with  each 
one.  MUch  of  our  recent  research  has  been  in 
the  development  of  Just  such  a  portable 
shell.  A  prototype  of  this  shell  is  current¬ 
ly  runnlna  on  the  following  systems:  P DP- 
11/780  (VMS)*  PDP-11/70  (  I  A  S )  »  CDC  f>600 

(homegrown  operating  system). 

?)  a  common  file  naming  convention.  Current 
research  (based  on  a  pathname  structure)  Is 
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progressing  In  this  area*  and  a  prototypical 
system  Is  operational  on  the  PDP-11/70  (IAS) 
system. 


The  rest  of  the  discussion  will  asume  that  these  two 
entitles  exist  on  all  hosts  In  the  network. 


7.2.2  EupdaBsnlal  flyinllil&A  In  a.  bjuiJl&c  SxiIm 

There  are  three  basic  quantities  In  a  civilized  computer  en¬ 
vironment  which  a  programmer  must  be  able  to  manipulate. 
They  are : 

1.  file  -  this  category  Includes  non-file  struc¬ 
tured  devices  ( e . g • «  1 1  0 »  m  t  0  *  etc.)*  data 
files*  and  executable  Image  files. 

2.  process  -  this  entity  describes  an  Image  file 
plus  Its  context  (standard  Input*  output*  and 
error  files*  default  directory*  privileges* 
etc.)  which  Is  currently  In  a  schedulable 
state  or  waiting  upon  some  resource  In  order 
to  become  schedulable  In  a  particular  host. 

3.  vertex  -  this  "virtual"  entity  allows  two 
processes  to  establish  an  Interprocess  com¬ 
munication  link. 


Several  operatinsi  system  primitives  are  necessary  to  allow  a 
programmer  to  manipulate  these  quantities. 


open 

open  a  file 

close 

close  a  file 

create 

If  file  exists*  open  It*  else  create  it 

delete 

delete  file 

rename 

rename  file 

ge  t  c 

get  a  character  from  a 

file 

out  c 

put  a  character  Into  a 

file 

mark 

note  current  position 

In  a  file 

seek 

position  a  file 

prompt 

output  string  with  no 
control 

terminating  carriage 

LC£££ii  2Ll£.nlS.& 

spawn 

spawn  process*  sending 
to  It 

specified  arguments 

ps  t  a  t 

query  status  of  a  process 

kill 

terminate  process 

suspnd 

suspend  process 

resume 

resume  suspended  process 

ISLlSS.  a£l£ai£y 


Georgia  Institute  of  Technology 


IPC  Workshop 


Section  7 


CURRENT  TECHNIQUES  AND  EXPERIENCE 


Page  76 


pipe  create  a  vertex  and  open  a  link  to  It 

A  few  more  words  concerning  vertices  are  in  order.  A  vertex 
is  a  valid  input  parameter  to  the  open  and  close  primitives. 
In  this  way.  subprocesses  may  be  linked  together  by  redirec¬ 
ting  the  respective  standard  outputs  and  standard  Inputs  to 
a  vertex.  The  subprocess  itself  Is  oblivious  to  the  source 
or  destination  of  its  information.  A  vertex  is  also  a 
transitory  auantity.  In  the  sense  that  when  all  links  to  It 
have  been  terminated  (via  a  close  operation).  It  vanishes. 
All  I/O  through  a  vertex  should  be  synchronous  to  avoid  all 
of  the  problems  inherent  in  buffering  asynchronous  I/C  in 
dynamic  system  memory. 


7.2.3  Hflflinfl  Convention! 

Files  are  known  globally  by  their  pathnames: 

/hostname/default  directory/filename 

Once  a  process  has  established  a  link  to  a  file  (via  an  open 
or  create),  the  file  is  then  known  Internally  to  the  process 
by  the  id  returned  as  the  value  of  the  primitive  function 
invoked. 

Processes  are  known  globally  by  the  id  returned  as  a 
parameter  of  the  spawn  primitive: 

/ h os t name/p rocess  i  d 


Vertices  are  known  globally  by  the  following  pathname: 
/hostname/processid/vertexname 


One  sees  that  as  long  as  the  first  field  of  a  file  pathname 
can  never  assume  the  value  of  a  process  id  field,  this  nam¬ 
ing  convention  uniquely  identifies  all  quantities. 


7.2.4  lfi£l£a£iii£li&n  In  i  Distributed  Environment 

A  skeleton  of  a  tyoical  primitive  would  look  as  follows 

* 

if  (local  (ARGUMENTS)  ==  YES) 

{ 

perform  f unc t i on 

> 

else 

{ 

reformulate  request  (if  necessary) 
forward  request  to  KERNEL 
wait  for  result 
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} 


The  purpose  of  the  Local  function  Is  to  determine  If  the 
request  can  be  performed  within  the  requesting  process* 
(File  and  process  oriented  primitives  can  usually  be  per¬ 
formed  Locally  If  they  Involve  local  files  and  processes.) 
If  It  cannot  be  performed  Internally*  the  request  may  have 
to  be  reformulated  to  Include  process  context  Information* 
and  Is  then  forwarded  to  the  KERNEL*  which  Is  an  extension 
of  the  native  operating  system.  Due  to  differences  in  the 
services  provided  by  most  native  operating  systems*  one  sees 
that  the  local  function  will  be  system  dependent.  The  KER¬ 
NEL  Is  a  separate  process*  one  per  host*  which  has  access  to 
the  physical  links  of  all  hosts  In  the  network  which  are 
directly  connected  to  the  current  host.  The  KERNEL  fields 
three  types  of  requests: 

1.  local  requests  for  local  services  not 
provided  by  the  native  operating  system 

2.  local  requests  for  services  on  remote  hosts 
in  the  network 

3.  remote  requests  for  local  services  on  behalf 
of  a  requestor  on  a  remote  host 


For  the  first  type  of  request*  the  KERNEL  will  perform  the 
service*  and  return  status  and  any  other  information  to  the 
requestor.  The  last  two  types  of  requests  are  linked  in 
their  function.  For  type  2*  the  KERNEL  forwards  the  request 
to  its  counterpart*  which  receives  a  request  of  type  3. 
This  request  is  performed,  and  return  information  is  forwar¬ 
ded  to  the  original  requestor  through  the  network. 

All  types  of  distributed  activity  are  then  supported  in  such 
o  network  environment.  The  following  examples  will  serve  to 
emphasize  this  point. 


7.2*5 

V_i  r£y a_l  le_£m_i  na_l 

User  is  currently  Interacting  with  the  shell  on  host  A  with 
standard  input,  output*  and  error  files  being  ttn*  and 
default  directory  DEFAULT.  User  wishes  to  establish  virtual 
terminal  connection  with  host  B.  To  do  so*  he/she  issues 
the  following  command  at  his/her  terminal 

X  R/shell 

A/shell  detects  that  this  Is  a  request  to  spawn  a  process  at 
another  host*  so  It  reformulates  the  command  as 

B/shell  <  A / 1 1  n  >A/ttn  >*A/ttn  (DEFAULT) 
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and  forwards  request  to  A/KERNEL*  which*  In  turn*  forwards 
the  request  to  B/KERNEL*  which  performs  the  service  and 
returns  status  to  the  requesting  process  via  A/KERNEL.  The 
next  prompt  that  the  user  sees  wilt  be  that  of  the  shell 
operatlnq  on  host  B*  with  the  shell  on  A  being  suspended 
until  B/shell  has  received  an  end  of  file  on  the  standard 
Input . 

i£^na£2££Q£l  1&  Q2li ££  litAiliiSi 

User  on  host  A  wishes  to  copy  a  file  from  host  A  to  host  B? 
he  Issues  the  following  command? 

%  copy  file  B/path/flle 

The  shell  will  spawn  copy*  copy  wILL  open  file*  and  attempt 
to  open  B/path/flle.  The  open  request  will  be  forwarded  to 
A/KERNEL*  which  In  turn  forwards  request  to  B/KERNEL. 
B/path/flle  will  be  opened*  and  all  writes  to  It  will  be 
directed  through  the  KERNELS  and  the  network  link. 

Ini££££0£eS£  £ommun2c a£_j_on  between  processes  on  different 

Ealis. 

User  on  host  A  wishes  to  analyze  a  data  file  with  a  utility 
available  on  host  B»  directing  the  output  of  that  utility  to 
a  graphic  display  program  on  host  A  which  displays  the 
results  on  the  user*s  graphics  terminal. 

X  B/analyze  <mydata  |  A/graphlt 

A/shell  will  Issue  a  spawn  request  to  A/KERNEL  with  the  fol¬ 
lowing  command  line 

B/analyze  <A/DEFAULT/mydata  >A/shel l id/plpel  & 

where  A/ she 1 1 1  d/p  1 pe 1  Is  a  vertex  created  by  A/shell.  The 
ampersand  (R)  Indicates  that  A/shell  does  not  wish  to  wait 
for  the  completion  of  the  spawned  process.  A/shell  will 
also  spawn  A/graphit*  redirecting  its  Input  to 

A/s  he 1 1 1 d/o 1 pe 1 .  A/shell  can  then  sit  back  and  monitor  the 
progress  of  the  two  cooperating  processes*  regaining  control 
when  they  complete  or  terminating  them  If  errors  occur  dur¬ 
ing  their  execution. 
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7.3  ESflimsa  MAlLBfl&ES  AS  M  I££  BECttAMlStt 

by 

R.L.  Gordon 
PRIME  Co*puter»  Inc* 


* 


Keywords:  mailbox*  IPC  primitives*  switch-board  tasks* 

access  lists 


It  Is  the  thesis  of  this  short  note  that  IPC  facilities 
built  arouno  the  notion  of  a  protected  mailbox  could  provide 
the  bar.  1 »  tor  a  robust  set  of  primitives*  Robustness*  In 
this  case*  Implies  their  utility  In  conventional  mul- 
tlprogrammed  uniprocessor  systems  as  well  as  shared  memory 
multiprocessors*  loosely  coupled  multiprocessors  and  local 
and  long  haul  networks.  The  proposed  mechanism  can  support 
different  communication  forms  (N-process  protocols)*  addres¬ 
ses  security  Issues*  and  assists  users  In  the  synchroniza¬ 
tion  of  what  Is  basically  an  asynchronous  phenomenon 
(process  communication). 


7.3.2  Proposed  IPC  Primitives 

Mailboxes  are  created  by  a  process  "P"  executing  a  primitive 
of  the  form: 


u  =  c r ea t e ( A c c es s_L 1 s t *  T) 

which  Is  sufficient  to  bind  the  process  name  "P"  to  the 
unique  descriptor  "u"  of  the  created  mailbox*  and  associate 
the  list  of  processes  appearing  In  the  " A c cess_l 1 s t "  with 
the  mailbox  "u".  In  addition  the  create  primitive  specifies 
a  maximum  time  "T"  between  mailbox  use  (I  assume  mailboxes 
that  are  not  used  are  not  useful).  Thereafter*  If  the 
Identifier  "u"  Is  valid*  (e.g.  not  equal  to  ERROR)  then  any 
process  "P*"  appearing  on  the  " Ac ces s_L 1 s t "  and  wlshlnq  to 
send  mall  to  the  process  "P"  would  use  a  system  call  of  the 
f  orm : 


send  message(buf»  u) 

and  continue  execution.  This  primitive  would  have  the 
effect  of  eventually  placing  the  contents  of  "but"  In  the 
mailbox  "u"  of  process  "P"  alonq  with  the  name  of  the  sender 
"P* " .  Process  "P" »  wishing  to  receive  messages  In  mailbox 
"u"«  would  make  a  system  call  of  the  form: 
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receive  message(buf*  u) 

which  would  prohibit  any  further  progress  of  HP"  until 
either  a  message  is  received  from  a  process  on  the 
" Access _L 1st"  or  no  message  has  been  received  during  the 
time  Interval  "T"i  specified  in  the  "Create"  primitive. 
Notification  of  this  fact  would  would  appear  as  a  message  in 
"but"  if  the  user  had  included  a  system  process  responsible 
for  c  ommuni cat  1  on  monitoring  in  his  " A c c e s s_L i s t " •  [See 
Section  7.3.6  on  Fault  Tolerant  Aspects.}  To  complete  the 
set  of  primitives  a  system  call  of  the  form: 

de l e  t  e ( u ) 

would  cause  the  mailbox  "u"  to  be  retired  forever. 


7.3.3  Initialization 

Initial  dialogues  are  established  by  "receiving"  an 
identifier  "s"  of  the  current  system  mailbox  In  a  mailbox 
"r"  that  was  originally  created  with  only  the  name  of  a  well 
known  system  process  on  the  access  list.  The  system  mailbox 
identifier  "s*"  would  then  be  used  to  send  messages  to  the 
system  kernel*  with  replies  being  received  in  mailbox  "r". 

One  of  the  more  difficult  Issues  Is  with  the  design  of  the 
mechanism  needed  to  establish  communication  with  generic 
processes*  (e.g.  processes  that  represent  a  single  service 
but  may  have  multiple  instantiations)  and  with  discovery  of 
newly  created  processes.  The  trouble  stems  from  the  fact 
users  are  incapable  of  establishing  a  dialogue  with  any 
process  not  known  to  them*  and  therefore  cannot  include  them 
on  the  access  list.  For  these  reasons*  it  seems  desirable 
to  provide  a  "switch-board  process"  whose  sole  function  Is 
to  provide  a  generic  to  specific  name  mappinq.  For  example* 
such  a  service  would  be  used  to  return  the  specific  process 
name  (or  names)  of  the  latest  version  of  a  fancy  text 
formatter*  when  supplied  with  the  generic  name  "format". 


7.3.4  Sg£Ur i ty 

A  unique  descriptor  represents  a  sort  of  capability  (at 
least  for  communication  purposes)  since  possession  of  a 
mailbox  Identifier  provides  the  possesser  with  the  potential 
for  sending  messages  and  requests  to  the  process  bound  to 
the  identifier.  However*  If  the  target  mailbox  does  not 
have  the  sender  on  the  access  list  the  message  may  be 
discarded  by  the  system*  thus  essentially  controlling  com¬ 
munication  through  the  maintalnence  and  enforcement  of  the 
"Access _L 1st."  It  Is  clear*  therefore  that  security  Issues 
revolve  around  the  ability  to  control  changes  to  the 
"Access _L 1st*"  an  issue  already  explored  by  file  system 
designers. 
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If  one  takes  the  view  that  a  message  Is  an  attempt  to  access 
an  object  by  a  principal  EGRAH  723.  then  this  facility 
contains  all  the  elements  of  the  access  matrix  model  CLAPP 
713  of  protection.  By  havinq  different  processes  act  as 
monitors  of  objects  one  has  a  formalization  of  the  access 
model  since  the  Identification  of  the  accessor  ana  the  ob¬ 
ject  being  sought  are  both  available  to  the  monitor  process. 


7.3.5  Synchronization 

The  availability  of  the  senders  Identification  coupled  with 
the  access  control  list  provides  the  means  to  achieve 
solutions  to  sync h ron 1 z a 1 1  on  of  processes  ano  to  detection 
of  boolean  combinations  of  events.  Creation  of  mailboxes 
with  only  one  process  name  on  the  " Ac ces s_L 1 s t "  provide  the 
facilities  for  a  simple  "pipe"  (one  way  communication  chan¬ 
nel)  that  can  be  used  to  construct  a  self  clocking 
"pipeline"  with  the  "send"  and  "receive"  primitives. 
Logical  "or"-1ng  of  the  Input  from  two  processes,  say  A  and 
B»  can  be  accomplished  by  simply  Including  *  and  B  on  the 
" Acc ess_L 1 s t . "  Pore  complicated  forms  of  synchronization 
can  be  accomplished  by  creation  of  an  Intermediate  process 
that  performs  the  appropriate  level  of  demultiplexing. 
Broadcast  transmissions  are  simply  achieved  by  Iteration 
over  a  set  of  available  mailbox  Identifiers. 


7.3.6  Eaull  Ial£JLdQi  A&fifi&JU 

There  appear  to  be  many  forms  of  communication  errors  that 
are  recoverable  by  the  technology  underlying  the  IPC  level. 
Failure  of  underlying  mechanisms  can  easily  be  reported  to  a 
process  If  It  opens  a  channel  for  that  purpose  by  Including 
the  name  of  a  system  process  on  the  " A c ce s s_L 1 s t "  on  an  al¬ 
ready  opened  mailbox,  or  opening  one  for  just  that  purpose. 
It  seems  to  me  that  users  who  do  not  want  to  be  concerned 
with  error  handling,  should  not  be  forced  to  carry  along  a 
lot  of  extra  apparatus  for  those  who  do.  One  nagging 
concern  of  mine  Is  whether  the  system  should  force  error 
messages  (especially  for  timeouts)  Into  mailboxes  that  have 
not  Included  the  communication  monitor  on  the  " A c c e s s_L 1 s t . " 

Positive  acknowledgement  Is  purposefully  not  included  in 
this  scheme,  but  Is  left  to  the  user  to  construct  his  own  by 
setting  up  a  duplex  path  between  processes.  As  an  aid.  the 
design  of  the  "create"  primitive  must  have  a  value  "T"  for 
the  maximum  time  between  messages.  Since  the  primitives  are 
designed  to  be  used  over  a  wide  range  of  situations  most  ap¬ 
plications  will  have  some  knowledge  of  how  long  it  Is 
reasonable  to  wait  for  a  reply  or  Input  from  a  cooperating 
process. 
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7.3.7  SkiBanx 

A  set  of  primitives  for  Interprocess  communication  have  been 
proposed  that  seem  suitable  for  Implementation  In  a  wide 
variety  of  c 1 rcumstances .  Only  briefly  mentioned  however. 
Is  the  Issue  of  process  addressability  when  communication  Is 
desired  between  several  processes.  The  solution  of  this 
problem  requires  the  development  of  a  name  space  architec¬ 
ture  that  tackles  the  relationship  between  files,  devices, 
processes,  users  and  many  other  system  objects.  certainly 
beyond  the  scope  of  this  short  note. 
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7. A  BRIEF  DESCRIPTION  DSYS-PLITS 


by 

Janes  R.  Low 
University  of  Rochester 

The  si  iamayiaiian 

The  model,  of  interprocess  communication  that  we  use  in  DSYS- 
PLITS  has  evolved  from  that  used  in  the  PIG  (Rochester 
Intelligent  Gateway)  Operating  System.  Basically*  we  think 
of  a  program  being  composed  of  several  independent  processes 
(we  call  them  "modules")  c ommun i c a t i ng  with  each  other  only 
through  messages.  There  is  no  directly  shared  memory. 
Processes  are  relatively  stable  end  to  "fork"  a  process 
means  to  create  a  totally  new  environment  independent  from 
that  of  the  creator.  Our  basic  model  does  not  force  any 
hierarchy  on  the  processes  though  it  is  relatively  easy  for 
a  programmer  to  think  in  terms  of  hierarchies  if  he  wishes. 

am  CfliiiLlfcyisd  S^iem) 

OSYS  is  basically  a  set  of  facilities  added  to  existing 
programming  languages  and  operating  systems  to  support 
inter-process  communication  across  a  network  o*  heterogenous 
machines  (DEC  PDP-10  running  DECS Y S T EM- 1 0 «  Data  General 
ECLIPSES  running  RIG,  and  XEROX  ALTOs  running  the  ALTO 
operating  system).  DSYS  consists  of  operating  system  inter¬ 
faces  and  user  interface  procedures. 

Processes  communicate  via  messaaes.  The  SEND  primitive  sup¬ 
ported  by  DSYS  takes  three  parameters!  the  message  to  be 
sent?  the  process  identifier  of  the  destination  (originally 
obtained  through  interactions  with  a  name  service  process, 
or  provided  in  a  message  from  some  other  process),  and  a 
transaction  key  (analogous  to  a  "port").  All  connections 
between  processes  are  implicit.  If  a  process  has  obtained 
another  process's  name  it  can  send  that  process  a  message 
without  any  explicit  "open"  command.  Of  course,  the  proces¬ 
ses  themselves  may  ignore  messages  which  do  not  conform  to 
higher  level  (user-specified))  protocols.  Transaction  keys 
are  used  to  separate  various  conversation  streams.  DSYS 
will  guarantee  that  all  messages  with  a  specific  transaction 
key  sent  from  one  particular  process  to  another  will  arrive 
in  the  proper  order.  No  guarantee  is  made  about  messages 
with  different  transaction  keys.  Details  of  the  reliable 
transmission  and  flow-control  mechanisms  in  the  DSYS  subnet 
may  cause  messages  from  one  process  to  another  with 
different  keys  to  arrive  in  a  diffident  order  than  they  were 
SENT. 
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Selective  reception  of  messages  is  provided.  A  process  may 
state  that  it  wishes  to  receive  only  messages  from  a 
specific  set  of  other  processes  or  about  specific  transac¬ 
tion  keys.  Thus  the  general  form  of  RECEIVE  is 


RECEIVE  nsg  FROM  (sndrl*  sndr2t...  sndr3> 
ABOUT  (trnilt  trns2...) 


If  there  is  more  than  one  message  that  has  suitable  SENDER 
and  TRANSACTION*  an  arbitrary  one  is  selected  (subject  to 
the  constraint  of  ordering  within  a  SEN DE P - T R AN S A C T I  ON  pair 
mentioned  above).  If  the  user  wishes  to  enforce  more 
general  priority  mechanisms  he  may  use  the  PENDING  construct 
to  see  if  there  are  suitable  high  priority  messages  before 
he  receives  lower  priority  ones.  PENDING  takes  the  same  ar¬ 
guments  as  RECEIVE  and  returns  TRUE  if  there  are  suitable 
messages  and  FALSE  otherwise.  It  does  not  actually  perform 
the  RECEIVE  so  the  message  queues  are  left  intact.  If  all 
else  fails  and  the  user  wants  more  aeneral  reception 
criteria  then  he  can  ask  to  receive  all  messages  and  then  do 
his  own  local  aueinq.  Ue  believe  this  to  be  very  rare  a  r  d 
have  not  seen  this  done  in  the  programs  coded  so  far. 

OSYS  performs  all  queue  management*  reliaole  transmission* 
and  flow  control.  Application  programs  are  notified  of  ex¬ 
ceptional  conditions  (communication  lines  ooinq  down*  other 
processes  in  the  "distributed  job"  breaking)  via  emergency 
messages. 

E.L1I5.  U£lL.S.:ia£jL 

DSYS  Itself  considered  messages  as  just  strings  of  bits.  Ue 
have  found  it  desirable  to  provide  higher  level  message  sup¬ 
port  to  applications  programs.  This  higher  level  message 
support  is  called  PLITS. 

Traditionally*  fixed  message  formats  have  been  used  for  ap¬ 
plication  programs.  To  desian  a  new  message  type*  a 
programmer  would  lay  out  an  explicit  template  for  his  data. 
He  would  have  to  state  the  number  of  pieces  of  data*  their 
data-types*  the  external  representation  of  the  data  type! 
and  the  translation  routines  to  use  to  translate  between  the 
external  (used  in  messages)  r ep r e sen t a 1 1  on  and  the  Internal 
(used  In  his  program  variables)  representation  of  the  data. 

In  PLITS*  we  try  to  remove  the  burden  of  message  template 
design.  ^  y  automating  the  process  we  also  remove  one  class 
of  possible  errors.  In  PLITS*  the  applications  programmer 
sees  a  message  as  a  set  of  keyword  value  pairs.  We  call 
these  pairs*  "slots".  To  construct  a  message  he  specifies 
the  particular  set  of  slots  he  desires.  The  receiver  can 
determine  (for  individual  messages)  which  slots  are  present 
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and  their  values.  Thus,  a  message  to  a  file  server  might 
look  like; 


SEND  (action  "openflle*  mode  “update*  name  “"MYFILE"* 

directory  “"<myd1r>*»  Ini 1 1  a Iposl 1 1  on  ”0*  byteslze  “8) 
TO  FlleServer  ABOUT  OPNTransac t Ion ; 

"action"*  "mode"*  "name"  ana  so  forth  are  the  keywords  (or 
slotnames).  The  message  would  be  Identical  as  far  as  the 
receiver  were  concerned  if  the  sender  had  specified  a 
different  order  of  the  slots.  We  do  not  require  that  every 
message  contain  a  specific  set  of  slots*  but  of  course  it  is 
an  error  If  a  process  attempts  to  fetch  the  value  on  a  non¬ 
existent  slot.  Defaults  may  be  easily  implemented  using  the 
PRESENT  IN  primitive.  For  example*  the  file  server  above 
mlqht  wish  to  assume  that  the  directory  is  "<SYSTEM>"  If 
none  Is  specified. 


RECEIVE  msg  FROM  ANYSENDER  ABOUT  ANYTR ANS ACT  I  ON  5 

IF  NOT  (directory  PRESENT  IN  msg)  THEN 

PUT  (direct or y“"<SYSTEM>*)  IN  msg? 

thedirect  :  =  msg. directory; 


When  a  user  wants  to  use  a  slot  in  his  program  he  must 
declare  the  keyword  and  the  type  of  its  value  both  in  the 
sending  and  receiving  process. 


STRING  SLOT  filename; 
MODULE  SLOT  continuation; 


In  the  existing  1 mp l em en t a 1 1  on  of  PLITS  (see  below)  the 
data-type  of  each  slot  is  sent  in  the  message  and 
consistency  is  chec keo  durino  the  translation  from  the  ex¬ 
ternal  format  of  messages  to  the  Internal  format  of  messages 
during  reception  of  the  message.  I mp l em en t a t  i  on  is  underway 
to  have  a  "loadlnq"  time  (when  a  process  joins  a 
"distributed  job")  when  the  consistency  of  slot  definitions 
would  be  checked.  Small  Identifiers  for  each  slot  would 
also  be  given  at  this  time.  This  would  decrease  the  over¬ 
head  of  the  slot  mechanism  (currently  in  addition  to  the 
data*  a  type  code  and  a  character  strino  are  sent  for  each 
slot). 
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In  the  current  Implementation  the  "data-type"  of  a  slot  Im¬ 
plies  the  external  representat Ion  of  the  value  of  the  slot 
within  messages*  Thus  we  have  several  INTEGER  types* 


INTEGER16  SLOT  small! 


INTEGER32  SLOT  large! 
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Sill £  Si  iicieienlaiion 

The  DSYS  has  been  running  since  last  Spring  on  the  POP-10 
and  ECLIPSE  computers*  A  distributed  vision  application  was 
encoded  this  past  Summer*  Recently  an  ALTO  DSYS  support 
package  has  been  used  to  link  ALTO’S  to  the  ECLIPSE*  The 
PLITS  message  format  has  been  running  on  the  PDP-10  for  over 
a  year  (using  a  preliminary  version  of  OSYS  that  ran  only  on 
the  PDP-10).  A  design  for  the  support  facilities  necessary 
for  PLITS  on  the  ECLIPSES  and  ALTOs  has  been  completed* 

Almost  all  the  support  software  has  been  written  either  In 
SAIL  (on  the  PDP-10)  or  BCPL  (on  the  ECLIPSEs  or  ALTOs). 
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7.5  MODELS  fl£  CONCURRENT  fiflaflUMfiAHflll  killUllLi 


PARAMETRIC  MODELS  OF  CONCURRENT  COMMUNICATION  ACTIVITY 

by 

Bill  Buckles 

General  Research  Corporation 


INTRODUCTION 


Using  a  distributed  system  to  feign,  simulate,  or  emulate  a  second 
distributed  system  is  of  interest  primarily  to  those  engaged  in  design.  The 
principal  problem  in  this  approach  is  the  Inherent  timing  discrepancies  between 
the  existing  and  target  systems.  Lamport  [l]  has  made  invaluable  contributions 
applicable  to  this  area  and  this  study  is  directed  at  specializing  his  results 
to  emulation. 

MODELS  AND  STATES 


The  goals  are  to  determine  (1)  what  aspects  of  communication  behavior 
can  be  observed  from  an  emulation?  (2)  what  ancillary  relationships  must  be 
embedded  in  an  emulation  to  assure  that  the  primary  behavioral  attributes  can 
be  extracted?  and  (3)  if  the  ancillary  relationships  are  not  exact,  how  much 
confidence  may  we  place  in  the  extracted  primary  behavioral  attributes?  In 
order  to  achieve  this,  a  definition  of  process  state  has  been  derived  that 
deals  only  with  aspects  of  inter-process  communication.  The  target  process 
state  is  distinct  from  the  emulation  process  state,  but  the  former  is  embedded 
within  the  latter.  Additionally  a  progression  of  six  communication  models  have 
been  defined,  each  an  elaboration  of  the  previous  one. 

Model  1  is  a  single  process  emulating  Itself.  It  may  be  schematically 
represented  as 

Atj  Atg  Atg  Atj 
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where  Att  denotes  a  time  Interval,  m  a  message,  and  the  even  Intervals  denote 
active  communication  periods.  Model12  is  a  single  process  emulating  a  second 
process  with  uniform  time  distortion  (either  rate  increase  or  decrease).  Model  3 
is  a  single  process  emulating  a  second  process  with  both  uniform  time  distortion 
and  non-uniform  perturbations  (strictly  slow-down).  In  this  model,  the  emulation 
process  may  contain  more  periods  than  the  target  process.  However,  there  must 
exist  an  order-preserving  mapping  from  the  target  process  periods  to  the  emulation 
process  periods.  Model  4  advances  to  multiple  processes  with  equal  time  distortions 
and  perturbations.  Model  5  relaxes  the  equality  constraints  on  distortions  and 
perturbations,  but  requires  the  two  be  balanced.  That  is,  Inequality  among  the 
time  distortions  of  various  processes  must  be  offset  by  perturbation.  Model  6 
is  completely  unconstrained  with  respect  to  both  distortion  and  perturbation. 


The  state  of  a  single  target  process,  1,  at  time  period  j  is  denoted  by 
the  pair  s^  ■  [At,  n]  where  At  is  the  duration  of  the  most  recently  completed 

period  and  n  is  the  information  sent  or  received.  The  state  of  the  target 
system  is  denoted  S  ■  [s  ,  sOJ  ,  ...,  s  .  ].  The  state  of  a  single  emulation 

1J1  2j2  njn 

process  i  after  time  period  k  is  denoted  by  the  5-tuple  o^  ■  [s^ ,  At' ,u  ,r,p  (k)] 


where  s^  is  the  state  of  the  target  process,  At'  is  the  duration  of  the  most 

recently  completed  period,  y is  the  information  sent  or  received  during  the  iast 
period,  r,  a  constant,  is  the  uniform  time  distortion,  and  p (k)  is  the 
instantaneous  perturbation  at  the  beginning  of  the  current  period.  A  system 


state  is  denoted  by  l  »  [c^  «  02k 


when  exactly  one  a 


lj 


1  2 
assumes  a  new  value. 


nk 


].  A  system  state  change  occurs 


PRELIMINARY  RESULTS 


Time  models  are  inherently  continuous  while  the  state  model  described 
above  is  discrete.  Lower  and  upper  bounds  on  the  time  relationships  are 
desirable  to  fix  the  amount  of  error  between  state  changes.  Because  r  (the 
distortion)  is  constant,  only  p  (the  perturbation)  may  introduce  error: 

glb(p)  »  o (n)  [l  -  (At*  /  Y.  At!)] 

n  i-1  1 

n-1 

lub(p)  ■  o(n)  +  [At  ,  /  rT~  At'] 
v+1  i 

Unfortunately,  lub(p)  required  the  prediction  of  the  period  duration.  At  ., 
of  a  current  target  process.  An  assumed  order-preserving  mapping  illustrating 
the  lower  and  upper  bound  errors  follow. 
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lub  Example 


Model  6,  being  the  moat  general,  is  of  interest.  For  example,  determining 
what  measures  must  be  taken  to  preserve  the  state  transition  ordering  in  the 
emulation  to  reflect  accurately  the  state  transition  ordering  in  the  target 
process  is  necessary.  If  S  <  S,  in  time  and  the  transition  to  S  is  embedded 

cl  D  & 

in  £  and  the  transition  to  S.  is  embedded  in  £  then  we  would  desire  that 

X  by 

£  <  £  .  Let  o  be  the  specific  substate  that  changes  value  at  £  and  o, 

xyxij  r  x  y  km 

be  the  specific  substate  that  changes  value  at  £  .  Both  S  <  S.  and  £  <  £  if 

y  a  b  x  y 


y-1  x-1 

y^ij  Tw  ^y^ij  x^km^ 

w»x  1  J  w-0 

where  *  •  „°„„(p(v))  •  a  (r)  and  t 

p  qv  p  qv  p  qv  w 

time  in  period  w-1.  In  symbols: 

tw  -  r  •  o^CpCw)  •  0^(1  (At)). 


T 

W 

is  the  normalized  elapsed  emulation 
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CONCLUSIONS 

These  and  other  relationships  dealing  with  the  communication  behavior 
of  emulation  processes  have  been  formally  proved.  Some  knowledge  on  the  problem 
of  what  information  to  collect  and  how  to  analyze  it  has  been  gained.  It  is 
believed  that  future  investigation  will  further  strengthen  the  utility  of  the 
models . 
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7.6  mas  i££  mmant  turn 


by 

Robert  L.  Gordon 
and 

Jack  A.  Teat 

The  enclosed  Prime  research  note  Is  partly  based  upon  a 
couple  of  early  1978  Internal  Prime  RiD  meetings  concerned 
with  "Task  Control  and  Communication  for  Multiple  Processor 
Systems”*  It  discusses  the  synchronization  and  Interprocess 
communication  mechanisms  used  In  a  number  of  Important 
operating  systems  and  explores  the  Importance  of  these 
mechanisms  for  the  development  of  future  computer  systems* 
and  Is  offered  as  additional  material  for  the  current  tech¬ 
niques  and  experience  section  of  the  conference  report* 
since  It  summarizes  a  review  of  mechanisms  used  In  several 
well  known  systems* 


7.6.1  lQlCfldU&ll£Q 

Two  In-house  meetings  concerned  with  "Task  Control  and  Com¬ 
munication  for  Multiple  Processor  Systems*  were  held  on 
January  11*  1978*  and  March  22*  1978*  The  purpose  of  the 
meetings  was  to  provide  a  forum  for  the  discussion  of  exist¬ 
ing  operating  system  mechanisms  for  process  management  and 
Interprocess  communication  as  related  to  Pr1me*s  efforts  In 
process-based  computer  network  architectures* 

The  two  meetings  consisted  of  a  series  of  Informal 
presentations  by  members  of  Primes  R&D  staff  on  other 
systems  and  discussions  on  related  PRIMENET  communication 
meetings*  The  particular  topics  were:  .( 1 >  "Process  Com¬ 
munication  In  DEMOS"*  <2>  "Process  Control  And  Communication 
In  UNIX",  (3)  "TANDEM  And  VAX  Process  Structure",  <A>  "The 
Multlcs  I  PC  Facility",  (5)  "Event  Counting  And  Sequencing  In 
Distributed  Systems”*  and  (6)  "Communication  Primitives  For 
PRIMOS". 

The  purpose  of  this  note  Is  to  discuss  the  synchronization 
and  Interprocess  communication  mechanisms  developed  for  the 
systems  mentioned  above  and  to  explore  future  directions  In 
the  development  of  process-based  computer  networks*  Obser¬ 
vations  concerning  the  IPC  facilities  of  the  operating 
systems  discussed  are  based  upon  the  authors*  knowledge  of 
the  systems*  available  literature*  and  the  Prime  Conference 
talks*  Accordingly*  Section  II  of  this  note  presents  brief 
summaries  of  the  IPC  facilities*  and  Section  III  states  some 
conclusions  and  future  directions*  The  References  &  Selec¬ 
ted  Readings*  at  the  end  of  this  note,  lists  several 
articles  pertinent  to  the  study  of  Interprocess  Com¬ 
munications* 
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7.6.2  Synchroalzailop/lPC  Ej&liillgl 

Included  In  this  section  are  discussions  of  the 
synchronlzat lon/lpc  mechanisms  developed  for  the  systems 
mentioned  In  the  Introduction.  For  additional  Information 
regarding  each  system*  refer  to  any  of  the  pertinent 
references. 


7.6.2.1  Process  Communication  In  DEMOS 

DEMOS  Is  an  operating  system  under  development  at  the  Los 
Alamos  Scientific  Laboratory  for  the  CRAY-1  computer  t BASK 
771.  A  task  or  process  In  DEMOS  consists  of  a  program  and 
Its  associated  state  Information  which  Includes  a  link 
takic*  The  primary  mechanism  for  communicating  between  user 
and  operating  system  tasks  Is  by  passing  messages  over 
links.  Links  are  associated  with*  but  maintained  outside 
the  address  space  of  sender  tasks  and  are  essentially  one¬ 
way  (simplex)  communication  paths.  All  operations  on  links 
are  performed  by  the  kernal  of  the  operating  system  which 
Insures  their  Integrity. 

Appropriate  standard  links  are  provided  by  the  system  for 
user  tasks  requesting  operating  system  services.  These  are 
provided  In  an  automatic  and  transparent  way*  one  such  stan¬ 
dard  link  being  to  a  switchboard  task.  Switchboard  tasks 
can  arrange  to  get  two  or  more  mutually  cooperating  proces¬ 
ses  together*  and  since  tasks  may  under  certain  conditions 
pass  link  Identification  Information  as  a  message*  dynamic 
process  networks  may  be  easily  constructed. 

Links  resemble  capabilities*  so  their  management  must  take 
Into  account  many  of  the  well  known  difficulties  of  managing 
capabilities.  Some  of  these*  such  as  lack  of  control  over 
link  passing  and  link  duplication  have  been  partially  al¬ 
leviated  by  classifying  links  Into  specific  types  and 
restricting  specific  operations  to  these  types.  Other 
facilities  Include  data  segment  Links  and  channels  that  are 
associated  with  links  In  order  to  provide  facilities  for 
multiple  event  handling  and  windows  Into  task  address 
spaces. 

The  communication  mechanism  of  DEMOS  Is  not  pure  In  several 
ways.  First*  data  segments  are  an  escape  from  communication 
only  by  messages!  and  second*  conditional  receives  and  chan¬ 
nel  Interrrupts  provide  an  escape  from  the  sychronlzat Ion 
provided  only  by  message  primitives.  However*  with  proper 
hardware  support  these  escapes  might  not  be  necessary. 
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7.6.2 .2  UNIX  Process  Control/Communlcatlon 

The  UNIX  system  was  developed  at  Bell  Telephone  Laboratories 
for  the  DEC  11/40.  45.  and  70  minicomputers.  The  basic 
literature  reference  to  the  system  CRITC  74]  provides  a  good 
explanation  of  the  principle  Ideas  Incorporated  In  the  UNIX 
design. 

In  UNIX,  a  "process"  Is  defined  to  be  the  execution  of  an 
"Image"  where  an  Image  Is  a  computer  execution  environment, 
namely:  allocated  core*  register  values,  open  files,  etc. 
Images  are  small  In  UNIX,  roughly  32K  words  ♦  status  In¬ 
formation.  and  the  system  Is  oriented  around  their  execution 
manipulation. 

Processes  are  organized  In  a  parent-child  tree-structure 
within  the  UNIX  system  environment.  Parent  processes  can 
spawn  (create)  child  processes  dynamically  through  a  fork 
system  call.  Initially,  the  child  process  Is  a  copy  of  the 
parent  process  but  with  a  different  return  value  from  the 
fork  call.  The  child  Inherits  the  parent’s  environment 
(l.e.  open  files,  register  values*  etc.)  but  does  possess 
Its  own  memory  Image.  Typically,  a  child  process  will 
Initiate  an  exec  system  call  which  will  overlay  the  child 
Image  with  the  startup  Image  of  a  program  named  In  the 
call.  In  this  manner,  a  parent  process  can  create  any  child 
process  It  desires. 

The  main  form  of  communication  between  parent  and  child 
processes  Is  accomplished  through  pipes  created  by  the 
parent  process.  Since  the  parent’s  environment  Is  lost  when 
a  child  process  overlays  Itself,  the  pipe  descriptor  must  be 
passed  as  an  argument  to  the  overlaying  "exec"  system  call. 
Pipes  serve  as  serial  data  paths  with  one  "write  end"  and 
one  "read  end".  Multiple  processes  can  write  or  read  a 
single  pipe  but  data  can  be  Intermixed  If  the  pipe  Is  not 
locked  on  writes.  In  addition  to  the  pipe  mechanism  In  the 
original  release  of  UNIX*  new  versions  of  the  operating 
system  allow  processes  to  communicate  through  message^  that 
are  routed  and  queued  for  unique  process  Identifications. 
Messages  In  UNIX  serve  as  a  more  general  form  of 
Interprocess  communication  than  pipes  since  "unrelated" 
processes  can  communicate  using  them.  For  mutual  exclusion 
and  synchronization  purposes,  the  UNIX  system  provides  both 
wal t/|f onfll  and  counting  semaphores  for  use  by  user  proces¬ 
ses. 

There  are  a  number  of  limitations  to  the  current  IPC 
mechanisms  available  In  UNIX.  Specifically,  pipes,  because 
of  their  serial  nature,  must  be  used  carefully  In  order  to 
avoid  mixed  streams  on  the  write  end  or  lost  streams  on  the 
read  end.  In  addition,  the  message  mechanism  In  UNIX 
requires  the  process-id  of  sending  and  receiving  processes. 
Unfortunately,  this  Information  Is  not  available  through  any 
system  administered  switchboard  and  must  be  handled  b>  the 
processes  themselves  In  some  arbitrary  manner.  The  naming 
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of  processes*  therefore*  Is  not  adequately  addressed  In 
UNIX. 

In  summary*  the  UNIX  timesharing  system  provides  a  dynamic 
and  flexible  process  environment  with  a  high  degree  of 
modularity.  Some  notable  shortcomings  In  the  UNIX  IPC 
facility  (In  addition  tr  the  problems  discussed  above)  are: 
(1)  the  Inability  of  a  process  to  wait  for  multiple  piped  or 
message  Inputs*  (2)  the  small  address  space  available  per 
process*  admittedly  a  PDP-11  Imposed  limitation*  and  (3)  the 
lack  of  any  network  process  management  capability. 


7. 6.2.3  Interprocess  Communication  In  TANDEM 

The  Guardian  Operating  System  [BART  771  for  the  Tandfcm  Com¬ 
puters  model  16  computer  has  as  Its  foremost  goal  the 
malntalnance  of  a  f al ture-tolerant  computing  environment. 
Even  though  the  underlying  Tandem  hardware  consists  of  mul¬ 
tiple  computers  and  multiple  dual-ported  I/O  devices*  the 
operating  system  Is  designed  to  give  the  appearance  to  the 
user  of  a  unified  system  through  the  novel  application  of 
several  software  abstractions. 

The  first  abstraction  provided  Is  that  of  a  process.  Each 
processor  module  may  have  one  or  more  processes  residing  on 
It*  however  a  process  may  not  execute  on  any  other  processor 
than  the  one  It  was  Initially  created  on.  Each  process  In 
the  system  has  a  unique  Identifier  or  process-id  of  the 
form:  <cpu  0*  process  #>*  which  allows  It  to  be  referenced 

on  a  system  wide  basis. 

Process  synchronization  primitives  Include  counting 
semaphores  and  process  local  event  flags.  Semaphores  may  be 
only  used  for  synchronization  between  processes  within  the 
same  processor  and  are  typically  used  to  control  access  to 
resources  such  as  resident  memory  buffers  and  message 
control  blocks.  Event  flags  are  predefined  for  up  to  eight 
different  events  and  are  signalled  within  a  processor  by 
either  hardware  events*  such  as  device  Interrupts*  or  by  the 
function  All  event  signals  are  queued  so  that  they 

are  not  lost  If  the  event  Is  signaled  when  a  process  Is  not 
waiting  on  It*  and  a  process  may  wait  for  the  first  of  one 
or  more  events  via  the  function  MAjT.  Processes  may"  also 
specify  a  maximum  time  to  block  which*  If  exceeded*  results 
In  the  return  of  an  error  condition  to  the  process  that 
requested  It. 

The  message  system  used  for  communication  between  processes 
residing  on  different  processors  uses  five  primitive 
operations:  LINK*  LISTEty*  READLINK*  WRITELINK*  and 

to  Implement  what  can  be  best  thought  of  as 
dialogues  between  requestor/server  pairs.  Messages  are 
queued  for  processes  and  result  In  the  setting  of  an  event 
flag  for  processes  wanting  to  "LISTEN". 
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With  the  Implementat Ion  of  processes  and  messages*  processor 
boundaries  effectively  disappear.  System  wide  access  to  I/O 
devices  Is  provided  by  the  mechanism  of  process  pal r$ .  An 
I/O  process-pair  consists  of  two  cooperating  processes 
located  In  two  different  processors  that  control  a 
particular  I/O  device.  One  of  the  processes  Is  considered 
the  "primary"  one  and  the  other  the  "backup"  process.  The 
primary  process  handles  requests  sent  to  It  but  sends  In¬ 
formation  to  the  backup  process  via  the  message  system  In 
order  to  assure  that  the  backup  process  will  have  all  the 
Information  needed  to  take  over  control  of  the  device  In  the 
event  of  an  I/O  channel  or  device  error.  Because  of  the 
distributed  nature  of  the  system*  It  Is  not  possible  to 
provide  a  "block"  of  driver  code  that  could  be  called  direc¬ 
tly  to  access  the  device.  While  potentially  more  efficient* 
such  an  approach  would  preclude  access  to  every  device  In 
the  system  by  every  process  In  the  system. 

Processes  are  not  grouped  In  classical  ancestry  trees.  No 
process  Is  considered  subservient  to  any  other  process  on 
the  basis  of  parentage*  and  two  processes*  one  created  by 
the  other  will  be  treated  as  equals  by  the  system.  when  a 
process  "A"  creates  another  process  "B"*  via  a  call  to  the 
procedure  NEWPROCESS*  no  record  of  B  Is  attached  to  A.  The 
only  record  kept  Is  In  process  B  where  the  creation  "Id"  of 
A  Is  saved  and  Is  known  as  3*s  "mom".  When  process  B  stops* 
a  STOP  message  Is  sent  to  process  A.  If  B  wants  to  know 
whether  A  has  stopped  It  must  "adopt"  Its  mom. 

The  Innovative  aspects  of  the  Guardian  Operating  System  lie 
not  In  any  new  concepts*  but  In  the  synthesis  of  pre¬ 
existing  Ideas.  Of  particular  note  are  the  low  level 
process  and  message  abstract  Ions.  By  using  these*  all 
processor  boundrles  can  be  hidden  from  both  the  application 
programs  and  most  of  the  operating  system.  These  Initial 
abstractions  are  the  key  to  the  system’s  ability  to  tolerate 
failures  and  provide  the  con f 1 gur at  Ion  Independence  neces¬ 
sary  to  run  over  a  wide  range  of  system  sizes. 


7. 6. 2. 4  Process  Communication  In  Vax 

The  VMS  operating  system  architecture  CDEC  773  supported  by 
the  VAX  hardware  Is  a  process  structured  system.  Because  of 
this*  the  designers  of  VMS  were  motivated  to  look  for  and 
evaluate  the  utilization  of  alternate  process  communication 
schemes  In  order  to  ease  the  design  and  Implementation  of 
VMS.  It  Is  significant  that  this  study  resulted  In  three 
different  mechanisms  for  process  comunlcatlon  In  order  not 
to  force-fit  applications  Into  using  any  one  particular 
type. 
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The  three  Interprocess  communication  facilities  provided  by 
VMS  are  all  software  Implemented.  The  first  facility  Is  ap¬ 
parently  used  for  trusted  processes  (e.g*  Kernal  processes) 
and  consists  of  the  notion  of  event  f laqst  event  f lag 
clusters,  and  ffijaikS  that  allow  boolean  combinations  of  event 
flags.  Since  It  Is  well  known  that  this  form  of  (semaphore) 
type  communication  can  be  easily  abused  by  naive  users  It 
apparently  Is  restricted  only  to  trusted  processes. 

The  second  type  of  Interprocess  communication  used  In  VMS 
(Internal  communication)  consists  of  send  receive  queues 
that  have  Implicitly  associated  event  flags.  This  mechanism 
serves  as  a  way  of  passing  variable  quantities  of  data 
between  trusted  processes  with  a  fairly  high  degree  of 
efficiency.  Each  user  process  builds  Its  own  buffer  (data 
packet)  and  sends  It  to  a  "receive"  queue,  which  then  sets 
the  associated  event  flag  for  the  receiving  process. 

The  third  type  of  Interprocess  communication  mechanism 
(generalized  communication)  consists  of  primitives  for  hand¬ 
ling  mal Iboxes.  Mailboxes  can  also  be  thought  of  and  Im¬ 
plemented  as  queue  or  FIFO  files,  thus  they  can  use  the  same 
protection  mechanisms  as  files.  Of  course  mailboxes,  like 
files,  can  be  classed  as  both  temporary  and  permanent  so 
that  Interprocess  communication  can  take  place  while  proces¬ 
ses  are  "absent"  or  dormant*  a  useful  feature  for  writing  to 
logged  out  terminals.  In  addition,  processes  communicate 
with  mailboxes  In  a  fashion  similar  to  record-oriented  I/O 
thus  providing  a  framework  for  advanced  concepts  such  as  I/O 
red  1 rec t Ion. 

VAX/VMS  supports  not  only  processes,  but  also  lobs  that 
constitute  a  collection  of  subprocesses  and  groups  that  are 
sets  of  processes  that  share  resources.  Sub processes  can  be 
spawned  and  can  have  the  rights  of  the  creator  as  well  as 
the  rights  of  the  spawned  Image  thus  allowing  a  form  of  en¬ 
hanced  rights. 

It  seems  that  the  VMS  operating  system  provides  a  rich  set 
of  Interprocess  communication  primitives,  whether  It  Is  a 
consistent  set  and  can  be  managed  over  the  life  of  the 
system  remains  to  be  seen. 


7. 6. 2. 5  The  Multlcs  IPC  Facility 

The  Interprocess  communication  facility  supported  by  the 
Multlcs  system  Is  based  upon  the  concept  of  event  shflpne L&» 
The  primary  purpose  of  an  event  channel  Is  to  provide  synch¬ 
ronization  between  processes. 
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Event  channels  (which  can  be  thought  of  as  a  numbered  slots 
In  the  Ipc-faclllty  tables)  are  either  event-wait  or  event- 
call  channels.  The  event-wait  channel  receives  events  that 
have  occured  and  awakens  the  process  that  established  the 
channel  If  It  Is  blocked  waiting  for  an  event  on  that  chan¬ 
nel.  The  event-call  channel  responds  to  the  occurence  of  an 
event  by  calling  a  specified  procedure  If  the  process  which 
established  the  channel  Is  blocked  waiting  for  any  event. 

For  events  to  be  noticed  by  explicitly  cooperating  proces¬ 
ses!  event  channel  Identifier  values  are  typically  placed  In 
known  locations  of  a  shared  segment.  Processes  can  block 
waiting  for  an  event  to  occur  or  can  explicitly  check  to  see 
If  the  event  has  occured.  If  an  event  occurs  before  the 
target  process  blocks*  the  process  Is  Immediately  awakened 
when  It  does  block. 

In  summary,  the  event-channel  facility  In  Multlcs  provides  a 
flexible  synchronization  mechanism.  Typically,  processes 
establish  channels  and  wait  for  events  on  one  o'*  more  of  the 
channels  they  have  created.  The  utility  of  this  approach  is 
clearly  demonstrated  by  the  use  of  the  Ipc-faclllty 
throughout  Multlcs  for  all  user  process  coordination  and 
terminal  I/O  handling. 


7. 6. 2. 6  Event  Counting  and  Sequencing 

Synchronization  of  concurrent  processes  Is  usually  required 
for  the  relative  ordering  of  events  Internal  to  each 
process.  Most  currently  favored  synchronization  techniques 
such  as  monitors  CHOAR  743  and  semaphores  Involve  mutual  ex¬ 
clusion.  a  technique  that  only  Indirectly  notes  the  oc¬ 
currence  of  an  event.  A  alternate  set  of  synchronization 
primitives  have  been  proposed  by  Reed  and  Kanodla  CREED  773 
where  a  process  controls  Its  synchrony  with  respect  to  other 
processes  by  observing  and  signalling  the  occurrence  of 
events  through  operations  on  objects  called  y yer tcpqnts.  An 
eventcount  Is  an  abstraction  representing  the  number  of 
events  In  some  class  of  Interest  that  have  occurred. 
Operations  on  eventcounts  aret  ADVANCE(E)  -  Signal  one 
event!  READ(E)  -  Return  the  number  of  previous  ADVANCES  on 
E?  and  AWAIT (E.V)  -  Suspend  a  process  until  READ(E)  >=  V. 
ADVANCE  purely  transmits  Information.  READ  and  AWAIT  purely 
observe.  In  contrast  the  P  operation  on  a  semaphore  Is  not 
a  pure  observation  primitive  since  It  can  modify  the 
semaphore.  Pure  observation  or  signalling  primitives  are 
more  attractive  for  use  In  secure  systems  CLAMP  733.  If 
only  one  process  executes  ADVANCE  operations  on  an 
eventcount.  ADVANCE  and  READ  can  be  concurrent.  If  nore 
than  one  process  does  ADVANCES*  a  different  eventcount  can 
be  given  to  each  process,  and  the  sum  of  those  eventcounts 
gives  the  total  number  of  events  In  the  class. 
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When  mutual  exclusion  Is  needed  (when  events  must  be  ordered 
dynamically*  such  that  the  ordering  Is  not  known  In  ad¬ 
vance)*  a  sequencer  can  be  used*  A  sequencer  operates  like 
the  ticket  machine  In  a  bakery*  and  has  one  operation  called 
TICKET*  that  returns  the  number  of  previous  ticket 
operations  on  that  sequencer.  An  eventcount  and  a  sequencer 
can  be  used  to  Implement  a  semaphore.  Several  eventounts 
and  sequencers  can  be  used  to  Implement  semaphores  that  al¬ 
low  a  process  to  wait  for  several  different  events. 

There  seem  to  be  at  least  two  attractive  advantages  over 
other  alternate  synchronl 2a 1 1  on  schemes  that  eventcounts 
have  for  distributed  systems.  The  first  advantage  Is  that 
the  ADVANCE  operation  affords  a  natural  broadcast  mechanism 
to  all  processes  that  might  be  waiting  on  an  event*  because 
unlike  simple  semaphores  the  signaller  need  not  know  the 
names  of  the  Intended  observers.  The  second  advantage  Is 
the  avoidance  of  mutual  exclusion  where  only  the  relative 
ordering  of  events  Is  required*  thus  tending  to  limit  the 
amount  of  serialized  code  In  systems*  code  that  often 
results  In  performance  bottlenecks.  Eventcounts  and 
sequencers  could  be  used  by  an  operating  system*  Instead  of 
user-visible  semaphores*  for  Implementing  more  general 
Interprocess  communication  mechanisms  with  shared  files  and 
this  mechanism  could  be  made  available  to  the  user  to  coor¬ 
dinate  the  use  of  shared  resources. 


7. 6. 2. 7  Intertask  Communication  Primitives  For  PRINOS 

Several  Intertask  communication  capabilities  currently  exist 
within  the  Prime  operating  system  (PRIMOS).  Both 
lock/unlock  and  count  1  no  semaphores*  are  Implemented  at  the 
microcode  level*  and  are  available  for  system  and  user 
tasks.  In  addition  to  these  basic  synchronl *at 1  on 
primitives  for  communication  between  processes  on  the  same 
processor  PRIMOS  supports  a  set  of  PRIMENET  Inter-process 
communication  capabilities  based  on  x.25  flavored  "virtual 
circuits".  These  capabilities  allow  a  user  process  to 
establish  a  full-duplex  virtual  connection  to  another  user 
process  whether  local  or  remote. 

Virtual  circuits  can  be  managed  at  the  user  program  level  by 
the  proper  use  of  a  collection  of  subroutine  calls  to  PRIMOS 
and  provide  a  "Level  3"*  X.25  Interprocess  Communication 
Facility  (IPCF). 

The  major  services  provided  are  for  forming  a  connection* 
breaking  a  connection  and  transmitting  or  receiving  data. 
Generally*  two  different  forms  of  a  service  are  provided. 
The  first  form  Is  an  abbreviated  calling  sequence*  with  only 
a  minimum  amount  of  Information  needed  to  be  supplied  by  a 
user  In  order  to  establish  and  use  a  virtual  circuit.  The 
second  form  Is  a  more  detailed  one  that  allows  a  user  full 
access  to  all  fields  of  the  X.25  "Level  3"  defined  packet 
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formats.  The  latter  form  Is  Intended  primarily  for  users 
wishing  to  form  X.25  connections  to  non-Prlme  hosts  on 
Public  Data  packet  networks. 

Eleven  network  primitives  currently  compose  PRIMENET  and 
provide  capabilities  to:  establish  status  as  a  network  user 
1XSASGN)*  establish  a  network  connection  (XSCONN)*  get  local 
connect  Information  <X$GCON>*  accept  a  connection  (XSACPT)* 
clear  a  connection  (XSCLR)*  hand  off  a  connection  (XSGVVC)* 
receive  via  a  connection  <X$RCV>*  transmit  via  a  connection 
( XSTRAN) «  wait  on  transmit  or  receive  (XSWAIT)*  get  network 
status  (X$STAT)«  and  terminate  network  user  status  (XSUASN). 
This  set  of  PRIMENET  primitives  Is  based  upon  the  X.25 
protocol  and  Is  due  for  release  under  REV  17  of  PRIMOS.  The 
chief  shortcoming  to  the  current  PRIMENET  set  of  primitives 
Is  the  Inability  to  support  multiple  readers  and/or  multiple 
writers  per  connection. 

The  addressability  defined  In  the  basic  X.25  specifications 
refers  only  to  a  single  14-dlglt  address  per  host*  although 
It  Is  not  uncommon  for  a  host  (like  PRIMOS)  to  handle  mul¬ 
tiple  processes  and  users.  Therefore*  In  order  to  decide 
which  user  or  operating  system  service  should  control  a  con¬ 
nection*  each  Incoming  "call  request  packet"  In  PRIMENET 
must  specify  a  network  "port."  This  port*  coupled  with  the 
14-dlglt  address  of  the  target  system*  designates  a  target 
process  . 

Each  host  In  Rlngnet  has  a  pool  of  255  available  ports  that 
may  be  assigned  to  any  process  on  a  first  come*  first  served 
basis  by  a  call  on  the  operating  system.  However*  only 
ports  1  through  99  are  available  for  users*  the  rest  are 
reserved  for  system  use.  Permanent  port  assignments  to  a 
process  are  possible  by  controlling  the  order  In  which 
processes  are  Initiated  just  after  system  startup*  other¬ 
wise*  there  Is  no  absolute  guarantee  that  a  particular 
process  Is  associated  with  a  given  port  number. 

The  short  form  of  the  Initial  connection  protocol  uses  an 
ASCII  host  name  (e.a.  "ENG. 15")  Instead  of  the  long  14- 
dlglt  address  and  a  port  number  previously  acquired  by  the 
target  process.  The  "connect"  function  Is  typical  of  the 
IPCF  primitives  and  the  request  for  It  Is  shown  as  a  partial 
example  of  how  a  circuit  Is  formed  at  the  program  level. 

CALL  XSCONN  (VCIO*  PORT*  ADR*  A  DRL  ♦  VC_STAT> 

The  variable  ADR  points  to  a  string  containing  the  name  of 
the  Intended  host  (l.e  ENG. 15)*  ADRL  contains  the  length  of 
the  name  <£>)«  and  VC_STAT  represents  the  status  of  the 
requested  service.  Upon  completion  of  a  successful  connec¬ 
tion*  a  "virtual  circuit  Identifier"  (VCID)  Is  returned  that 
can  be  used  for  the  subsequent  transmission  of  data.  Incom¬ 
ing  calls  for  a  particular  port  In  a  host  are  queued  on  a 
first  come  first  served  basis.  Information  concerning  a 
call  request  at  the  head  of  a  port  queue  can  be  obtained  via 
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a  system  call*  so  that  connections  can  be  accepted*  refused* 
cleared*  etc*  Calls  are  kept  pending  for  90  seconds*  during 
which  the  requestors*  status  Is  that  of  "connection  In 
progress*"  Other  X*25  services  are  provided  to  users  that 
allow  for  waiting  on  the  completion  of  a  network  event*  ac¬ 
cepting  or  clearing  a  call*  passing  off  a  virtual  circuit  to 
another  process  In  the  same  host*  and  obtaining  status  In¬ 
formation  about  a  particular  circuit. 

At  a  level  above  the  PRIMENET  primitives*  PRIMOS  supports  a 
remote-login  capability  <RL0GIN>  and  a  network  flle-access- 
method  (FAM).  The  File  Access  Manager  (FAM)  Is  a  PRIMOS 
subsystem  that  extends  the  functions  of  the  PRIMOS  file 
system  to  a  network  of  hosts.  Virtualization  of  the  file 
system  Is  accomplished  by  permanently  assigning  a  port  C255) 
to  the  local  FAM  process  of  each  host*  over  which  virtual 
circuits  to  neighboring  FAMS  are  used  to  accomplish  remote 
file  operations  on  behalf  of  a  user. 

A  FAM  process  In  a  host  fields  requests  from  local  users  for 
file  operations  on  remote  hosts*  handles  Incoming  file 
requests  from  remote  hosts*  and  maintains  status  and  update 
Information  concerning  the  current  state  of  network  connec¬ 
tions  and  file  system  devices.  When  the  PRIMOS  supervisor 
decides  that  a  particular  user  request  Is  destined  for  a 
remote  device*  It  queues  the  request  for  the  local  FAM 
process  and  suspends  the  user.  FAM  packages  this  request  In 
a  message  and  passes  It  off  to  the  appropriate  remote  FAM* 
which  performs  the  requested  file  operations  on  behalf  of 
the  user.  The  remote  FAM  process  sends  the  original  request 
and  the  requested  data  back  to  the  local  FAM*  which  copies 
the  returned  values  Into  the  user’s  address  space  and  causes 
the  user  to  be  rescheduled.  Because  certain  file  primitives 
are  guaranteed  to  be  "atomic"  operations*  all  file  functions 
are  performed  to  completion  just  as  If  they  occurred  local¬ 
ly*  even  If  they  require  multiple  messages  or  updating  of 
local  supervisor  tables. 

Since  both  local  and  remote  operations  on  a  particular  file 
are  handled  through  the  file  system  of  the  host  that  owns 
the  particular  file*  all  of  the  normal  file  protection  and 
other  mechanisms*  such  as  locking  a  particular  record  while 
writing*  are  automatically  accomplished.  Applications  using 
remote  data  as  well  as  local  data  run  without  any  change. 

In  a  similar  fashion*  the  ability  of  a  user  to  "remotely 
log-in*"  as  If  their  terminal  were  physically  attached  to 
the  host  of  their  choice*  Is  achieved  by  the  operating 
system  multiplexing  all  remote  terminal  traffic  through  port 
"0."  When  a  user  "logs  In*"  they  may  designate  a  system  to 
be  attached  to  as:  * 

LOGIN  SMITH  -ON  ENG. 15 

At  this  point  the  local  login  server  establishes  a  virtual 
circuit  to  the  target  host  and  requests  the  Initiation  of* 
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and  connection  to»  a  process  In  the  remote  host.  Prom  then 
on  the  local  terminal  buffers  are  effectively  diverted  to 
the  Input  and  output  buffers  of  the  remote  process  running 
on  the  selected  node. 

A  proposal  for  an  Implementation  of  pipes  CSCHE  783  was 
discussed  as  an  alternative  to  virtual  circuits.  The  pipe 
mechanism  does  allow  multiple  readers  and  multiple  writers 
and  thus*  together  with  the  X.25  PRIMENET*  would  facilitate 
most  applications  that  demand  IPC  facilities  1 ncorporat Ing 
multiple  readers  and  writers. 

In  summary*  the  current  PRIMOS  Interprocess  communication 
capabilities  allow  local  and  remote  process  cooperation 
through  X.25  flavored  "virtual  circuits"*  In  addition  to  the 
semaphore  primitives  for  local  communication.  These  "point- 
to-point"  mechanisms  may  not  suffice  for  distributed  process 
applications  demanding  N-process  protocols*  however  the  set 
of  applications  demanding  such  protocols  at  this  time  seem 
small . 


7.6.3  Conclusions  and  Eylms  BlrtSilflPI 

As  this  report  has  Illustrated*  the  process  concept  has 
become  Increasingly  central*  In  recent  years*  to  the  design 
of  computer  systems  both  at  the  hardware  and  software 
levels.  There  are  many  reasons  for  this  development*  two 
Important  ones  being:  <1>  the  continuing  decomposition  of 
systems  and  applications  problems  Into  sets  of  cooperating 
parallel  programs  for  greater  modularity*  functionality* 
flexibility*  and  maintainability?  and  (2)  the  Increasing 
cheapness  of  processors  and  memory  allowing  the  assignment 
of  processes rto  processors  In  an  economical  way.  As  proces¬ 
ses  have  become  "cheaper"  to  create*  maintain*  and  destroy* 
the  flexibility*  scope*  power*  and  economy  of  Interprocess 
communication  mechanisms  has  become  Increasingly  central  to 
the  effectiveness  of  multi-process  systems. 

A  wide  variety  of  mechanisms  for  Interprocess  communication 
have  been  surveyed  In  this  report.  Perhaps  the  major  reason 
for  such  a  variety  comes  from  a  desire  to  provide  In  one  set 
of  primitives:  (1)  flexible  process  synchronization  tools* 
(2)  data  transfer  mechanisms*  and  (3)  communication  control 
and  error  recovery.  Some  of  the  major  Issues  Involved  In 
the  design  of  Interprocess  communication  mechanisms  are 
briefly  discussed  below. 


1.  Naming?  Many  systems  have  Inadequate 

facilities  for  Identifying  names  of  processes 
within  the  same  host*  let  alone  for  processes 
residing  on  different  hosts.  Part  of  the 
problem  stems  from  an  Inconsistent  view  of 
the  relationship  between  the  set  of  allowable 
names  for  files*  devices*  processes*  users* 
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mailboxes*  generic  system  services*  and 
specific  system  services*  Until  this  problem 
Is  settled  the  design  of  specific 
Interprocess  communication  primitives  cannot 
focus  on  the  set  of  fundamental  objects  that 
they  will  be  dealing  with*  This  Is  a 
difficult  Issue*  since  It  Is  here  that  many 
of  the  system  security  Issues  are  also  ad¬ 
dressed. 

2 .  Contra  Of  Links  Between  Processes :  Control 
of  communication  paths  between  processes  fun¬ 
damentally  depends  upon  the  nature  of  process 
relationships*  If  process  relationships  are 
tree  structured*  then  the  status  of  a  ch1ld#s 
communication  with  other  processes  might  be 
monitored  and  controlled  by  the  parent*  On 
the  other  hand*  If  each  process  wants  to 
maintain  the  concept  of  sovereignty  then  the 
basic  challenge  Is  how  to  provide  the  ability 
for  cooperating  processes  to  establish  a 
monitor  process  that  Is  capable  of  control¬ 
ling  the  communication  paths  between  them* 

3*  ianiral  £1  fiaia  £iai  fialaaau  pjcaamai?  The 

need  for  a  flexible  set  of  operations  to 
control  data-flow  between  processes  Is  of 
major  Importance  In  the  deslqn  of  IPC 
mechanisms*  This  Issue  Involves  providing 
processes  with  the  ability  to:  control  mul¬ 
tiple  links*  respond  to  out-of-band  signals* 
recel ve/t ransml t/f lush  stream  and  message 
data  types*  and  receive/transmit  link 

capabilities*  A  number  of  additional 

capabilities  might  also  be  considered*  such 
as  allowing  processes  to  define  data-type- 
llnks  that  facilitate  the  passing  and 
manipulation  of  complex  data  structures* 

a*  s^ntii£aaiiaiiaa  fll  Enatasati:  clearly*  a 

major  function  of  Interprocess  communication 
Is  to  provide  either  explicit  or  Implicit 

synchronization  between  processes*  Early 
forms  of  Interprocess  communication  depended 
only  on  the  correct  use  of  explicit  synch¬ 
ronization  primitives  for  sharing  sections  of 
main  memory*  In  some  systems*  temporary 
files  serve  as  synchronizing  points  between 
job  steps  (Implicit)*  while  In  other  systems 
processes  synchronize  and  exchange  data  by 
signalling  (explicit)*  Whether  explicit  or 
Implicit  synchronization  primitives  should  be 
provided  Is  still  very  much  an  open  question* 
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With  the  advent  of  cheap  communications  and  distributed 
systems  these  Issues  are  becoming  more  Important  each  day  to 
both  the  manufacturers  and  users  of  computer  systems*  A 
workshop  addressing  IPC  design  Is*  therefore*  scheduled  to 
be  held  In  Atlanta*  Georgia*  on  the  20 -22  of  November*  that 
will  bring  together  a  selected  group  of  researchers  In  this 
subject  area  to  address  the  five  general  topics  listed 
below: 

<1J  Assess  the  present  state-of-the-art  for  IPC 
mechanisms  In  distributed  data  processing 
systems* 

(2)  Identify  the  data  available  on  the  actual 
performance  of  various  IPC  policies  and 
mechanl sms* 

(3)  Assess  the  potential  value  of  various  IPC 

mechanisms  satisfying  the  operational  and 
performance  requirements  for  highly 

distributed  systems* 

( 4 )  Identify  shortcomings  In  the  present  state- 
of-the-art  and  Identify  promising  areas  for 
future  research  and  experiments  on  this  sub¬ 
ject* 

< 5 )  Identify  possible  standardization  levels  In 
IPC  design* 


Some  of  the  Issues  the  workshop  Is  Intending  to  examine  In 
detail  are:  addressing  Issues*  hardware  support*  transport 
mechanisms*  flow  control*  out-of-band  signalling*  fault 
tolerance*  security*  synchronl zat Ion*  and  performance  and 
application  programming  Impact.  Prime  Research  Is  actively 
participating  In  this  workshop  which  also  has  the  support  of 
both  IEEE  Computer  Society  and  the  three  ACM  Special 
Interest  Groups*  SIGOPS*  SIGARCH  and  SIGCOMM. 

In  conclusion*  there  are  far  reaching  ramifications  to  the 
demand  for*  and  the  development  of*  Interprocess  communica¬ 
tion  facilities  and  cheap  processes*  At  the  user  level*  a 
greatly  enhanced  system  functionality  and  flexibility  can  be 
achieved*  and  at  the  operating  system  and  hardware  levels* 
the  need  to  efficiently  support  this  functionality  Is  lead¬ 
ing  to  new  archl tectures  and  OS  designs*  As  the  section  on 
PRIMOS  In  this  report  suggests*  Prime  Is  developing  new  IPC 
mechanisms  for  the  enhancement  of  current  systems  and  Is  at¬ 
tempting  to  Incorporate  some  of  the  Ideas  developed  In  other 
systems*  In  addition*  as  new  computer  architectures  are  ex¬ 
plored  at  Prime*  the  need  to  Include  hardware  support  for 
critical  IPC  functions  Is  an  area  that  requires  study  and 
understanding* 
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OATA  COMMUNICATION  SOFTWARE 
by 
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Bell  Laboratories 


lnixadiitiian 

Distributed  computing  environments  are  based  upon*  and  whol¬ 
ly  depend  upon*  data  communications.  Although  there  exists 
a  sizable  and  growing  hardware  technology  for  data  com¬ 
munication*  software  has  not  generally  kept  apace  In  recent 
years.  Better  software  tools  and  techniques  are  needed  In 
order  to  experiment  with  the  new  hardware  devices  that  are 
available  In  the  laboratory  as  well  as  to  Improve  the 
capabilities  for  cooperation  between  our  normally  monolithic 
operating  systems.  These  notes  outline  the  direction  and 
status  of  communication-oriented  software  research  with  the 
context  of  the  7th  edition  of  the  UNIX  operating  system. 

Several  software  components  are  being  experimented  with  In 
computer  systems  at  Murray  Hill*  Including  a  PDP-11/45* 
11/70* s*  an  Interdata  8/32*  and  LSI-ll*s.  Some  of  the 
software  Is  part  of  the  UNIX  kernel*  or  resident  operating 
system*  and  the  remainder  consists  of  programs  that  utilize 
the  new  kernel  facilities.  The  software  components  In  the 
kernel  Include: 

1)  primitives  for  managing  Intermediate-sized 
contiguous  areas  of  kernel  data  space* 

2)  a  "packet  driver"  which  can  be  used  to  Impose 
framing*  sequencing*  checksumming*  and 
retransmission  procedures  on  a  character 
device* 

3)  multiplexed  and  non-mult Iplexed  Interprocess 
communication  channels. 

The  salient  characteristics  of  these  components  are 
described  In  the  next  three  sections.  .The  organization  of 
the  higher-level  codes  which  use  these  components  will  not 
be  discussed  here. 


The  previously  existing  space-management  procedures  In  the 
UNIX  kernel  were  used  to  Implement  the  terminal  character 
lists  and  the  disk  buffer  cache.  Since  the  size  of  an  al¬ 
location  permitted  by  these  routines  Is  either  one  byte  or 
512  bytes*  It  Is  not  surprising  that  an  additional  mechanism 
was  needed  for  data  communications.  There  are  but  two 


Georgia  Institute  of  Technology 


IPC  Workshop 


Section  7 


CURRENT  TECHNIQUES  AND  EXPERIENCE 


Page  105 


primitives  needed:  one  to  allocate  and  one  to  release*  The 
new  primitives  manage  contiguous  memory  segments  that  are 
some  multiple  of  32  bytes  In  size  up  to  a  maximum  of  512 
bytes* 

It  was  Intended  that  the  buffer  management  primitives  be 
fast  enough  to  be  Invoked  from  within  Interrupt  routines* 
This  means  that  recombination  or  garbage  collection  must 
also  be  capable  of  being  done  at  Interrupt  time*  These 
considerations  lead  to  a  strategy  which  employs  a  few 
Judiciously  chosen  bit-map  tricks  In  conjunction  with  the 
constant  allocation  sizes  mentioned  above* 

The  allocator  may  be  called  with  a  flag  which  directs 
whether  It  should  sleep  when  space  Is  not  available  or 
whether  It  should  return  a  failure  Indication*  This  was 
built  In  because  the  allocator  must  not  be  allowed  to  sleep 
when  called  from  an  Interrupt  routine.  However*  it  may  be 
equally  distressing  to  have  It  fall*  Current  practice  In¬ 
volves  building  strict  space  bounds  Into  Interrupt  processes 
that  cannot  live  with  allocation  failures*  This  way  space 
requirements  are  known  In  advance*  and  the  allocator  Is  used 
to  dedicate  a  private  buffer  pool  where  It  Is  needeo. 

Although  the  new  space  management  primitives  are  useful  for 
allocating  "ordinary"  I/O  buffers*  their  real  usefulness  Is 
In  supporting  the  flfo  queues  needed  for  data  rate  balancing 
between  readers  and  writers*  Because  of  the  address-space 
limitations  of  the  PDP-11*  memory  Is  a  critical  resource* 
and  It  Is  not  possible  to  devote  as  much  space  to  data 
queues  as  many  high-bandwidth  applications  require.  As  the 
software  described  below  matures*  It  will  become  necessary 
to  extend  flfo  mechanslms  to  secondary  storage  or  to  non¬ 
kernel  memory  space*  The  .methods  used  In  the  current 
primitives  can*  and  probably  will*  be  applied  In  these  other 
circumstances* 


£££*£!  J2ll£££ 

The  packet  driver  consists  of  a  group  of  routines  similar  In 
name  and  function  to  the  parts  that  make  up  the  typewriter 
control  software;  namely*  there  are  open*  close*  read* 
write*  loctl*  read  Interrupt*  and  write  Interrupt  entries. 
A  software  switch*  called  the  line-discipline  switch*  placed 
at  the  proper  locations  In  a  character  device  driver  selects 
whether  a  call  should  be  made  to  the  standard  system  control 
routines*  or  to  the  corresponding  entries  In  the  packet 
driver  or  other  line-discipline*  This  switch  mechanism  may 
be  thought  of  as  a  bidirectional  filtering  process  which  may 
be  selectively  Inserted  between  a  device  driver  and  a  user 
program* 
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The  packet  driver  Is  designed  to  operate  character  devices 
In  a  packet  node  with  the  error  checking  and  flow  controls 
that  are  necessary  for  reliable  data  communication.  The  Im¬ 
plementation  Is  organized  so  that  flow  control  functions  are 
at  a  high  level  and  are  Independent  of  framing  and  other 
details  of  link  control.  This  means  that  device  charac¬ 
teristics  are  transparent  at  the  flow  control  level*  allow¬ 
ing  the  code  to  be  used  In  different  contexts  -  e.g.  with 
both  bit-oriented  and  byte-oriented  lines*  or  DMA  and  non- 
DMA  devices.  Also*  Implementations  exist  for  the  UNIX  ker¬ 
nel*  as  a  user-level  subroutine  package*  and  currently  for 
one  non-UNIX  system.  Emphasis  has  been  placed  on  learning 
how  to  produce  communication  software  that  Is  operating 
system-independent  as  well  as  machine-independent.  In  prac¬ 
tice  this  means  that  the  packet  driver  Implementations 
listed  above  consist  of  protocol  routines  which  are  common 
In  all  cases  plus  1o  and  clock  routines  which  are  system 
dependent.  Since  protocol  changes  Invariably  affect  only 
the  common  code*  the  logistics  of  making  network-wide  Im¬ 
provements  or  repairs  simplify  to  updating  a  common  file  and 
reloading  the  appropriate  system  programs. 

There  exist  numerous  link  control  and  flow  control 
procedures*  however  they  were  Judged  not  suitable  for  our 
uses  for  a  variety  of  reasons.  Some  typical  complaints  are 
that  flow  control  procedures  are  not  really  end-to-end*  pac¬ 
ket  formats  are  complicated  and  verbose  requiring  a  fair 
amount  of  real-time  scanning*  multiplexing  Is  usually 
defined  In  Immutable  ways*  and  error  control*  framing*  mul¬ 
tiplexing*  and  flow  control  are  usually  mixed  together 
Instead  of  oelng  separated  where  possible.  These 
cons  1 der at  Ions  led  to  the  following: 

1)  flow  control  Is  based  on  a  sliding  "window" 
of  sequence-numbered  packets.  The  numbers 
are  modulo-8*  the  maximum  window  size  Is  7* 
and  the  window  sizes  are  controlled  by  the 
receivers.  The  retransmission  strategy  uses 
either  "go-back-N"  or  selective  single  packet 
retransmission  at  the  receiver’s  discretion. 

2)  packet  sizes  and  window  sizes  are  negotiated 
oetween  two  communicating  packet  drivers. 

The  packet  and  window  sizes  In  each  direction 
need  not  be  the  same. 

3)  packets  may  range  In  size  from  32  bytes  to  a 
maximum  of  4096  as  determined  by  the  formula 
32  *  (2  **  k)  where  k  Is  an  Integer*  0  <  k  < 

7. 

4)  all  message  headers  are  the  same  size*  unlike 
x • 25  and  other  similar  protocols. 

5)  It  Is  possible  to  multiplex  the  link  at  the 
packet  level*  or  within  packets*  or  both. 
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The  software  overhead  of  running  the  packet  driver  on  9600 
baud  tines  Is  quite  low.  The  Implementation  Is  efficient 
enough  that  data  rates  exceeding  50K  baud  have  been 
demonstrated  with  this  software  using  a  a  PDP-11/45  and  non- 
DMA  devices.  As  one  would  expect  the  overhead  at  higher 
data  rates  consumes  the  available  cpu  resources.  For  this 
reason  the  packet  driver  Is  looked  upon  as  an  algorithmic 
testbed  and  Intermediate  step  toward  improved  computer 
peripheral  hardware  for  communications. 


Interprocess  and  Process-device  Communication 

Multiple  Independent  asynchronous  data  streams  and  events 
comprise  the  greater  part  of  the  environment  for  data  com¬ 
munication  software.  It  has  been  observed  many  times  that 
"blocking"  I/O  as  Implemented  In  the  UNIX  timesharing  system 
does  not  provide  direct  methods  for  dealing  with  these 
entitles*  and  there  are  sound  architectural  reasons  why  It 
does  not.  Nevertheless*  a  process  that  must  read  from  more 
than  one  source  sould  not  have  to  wait  on  Idle  data  sources 
since  Input  data  will  be  missed  or  delayedd  on  lines  that 
are  actively  producing  data  while  the  process  Is  blocked. 
(It  Is  assumed  that  polling  techniques  are  unacceptable.) 
Also*  the  flow-control  scheme  used  throughout  the  system 
causes  writer  to  block  If  the  total  amount  of  written  data 
exceeds  a  threshold.  Such  processes  sleep  until  the 
corresponding  reader  (process  or  device)  consumes  some  or 
all  of  the  waiting  data.  A  communications  process  typically 
must  write  to  several  processes  and/or  lines  at  once.  It  Is 
somewhat  Inefficient  to  force  such  a  process  to  block  on  a 
"slow"  device  or  process  when  there  are  other  readers  that 
can  be  written  to.  Thus  It  would  apppear  that  an  operating 
system  must  provide  techniques  for  dealing  with  asynchronlsm 
and  blocking  or  flow-control  problems  as  well  as  supply  a 
useful  means  for  establishing  data  bpaths  between  the 
various  data  sources  and  sinks.  The  mechanism  outline  below 
accomplishes  these  Immediate  goals  In  a  simple  and  direct 
manner . 

Two  entitles  are  defined:  channels  and  multiplexed  chan¬ 
nels*  also  called  channel  groups  or  groups  due  to  the 
similarity  with  existing  notions  In  telephony.  A  channel 
consists  of  a  pair  of  full-duplex  communication  paths.  One 
pair  Is  designated  as  the  "data"  path  and  the  other  as  the 
"control"  or  "signaling"  path.  This  architecture  explicitly 
recognizes  the  need  for  what  Is  usually  called  "out-of-band" 
signalling  by  dedicating  a  communication  path  for  the 
purpose.  In  the  Implementation*  each  path  has  some  amount 
of  fifo  or  data  queuing  built  Into  the  transport  mechanism. 
However*  the  actual  data  transport  Is  dealt  with  Indirectly: 
In  order  to  avoid  unnecessary  copying  of  data  from  place  to 
place  within  the  system*  the  data  Is  placed  somewhere  using 
a  buffering  mechanism*  tokens  Indicating  where  the  data  can 
be  found  are  passed  from  place  to  place.  This  decoupling  of 
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the  flfo  and  buffering  functions  from  the  data  transport 
mechanism  Increases  the  efficiency  of  data  movement  and 
permits  Insertion  of  or  tuning  of  buffering  mechanisms  In  a 
transparent  manner. 

A  channel  can  be  thought  of  as  a  software  null-modem:  a 
null-modem  consists  of  two  plugs  connected  by  some  wires 
( f 1 f o/buf f er 1 ng )  so  that  data  and  signals  transmitted  at  one 
plug  are  received  at  the  other  and  vice  versa.  In  the  hard¬ 
ware  world  one  may  connect  computers*  computer  terminals* 
and  various  other  digital  devices  to  one  another  via  null- 
modems.  In  the  software  world  one  may  attach  processes* 
devices*  other  channels*  and  groups  (see  below)  to  the  ends* 
or  plugs*  or  a  channel. 

The  multiplexed  channel  construct  Is  a  bundling  mechanism 
("Bundling"  Is  a  convenient  term  to  describe  a  construct 
which  fans-ln*  fans-out*  or  otherwise  merges  data.  Examples 
Include  the  PORT  mechanism  developed  at  RAND  and  elsewhere* 
certain  aspects  of  the  C.mmp  system*  and  the  UNIX  timeshar¬ 
ing  system  tee  command.)  which  supplies  both  a  multiplexing 
discipline  for  merging  data  from  many  channels  and  the  In¬ 
verse  mechanism  for  sending  data  to  the  Individual  channels 
In  a  bundle*  or  group.  A  process  can  arrange  to  have 
various  devices  and  processes  "plugged-ln"  to  the  ends  of 
channels  and  bundle  all  the  opposite  endings  together  In  a 
multiplexed  channel*  or  group.  In  this  way  a  read  command 
Issued  on  the  multiplexed  channel  will  return  any  and  all 
data  (up  to  the  requested  limit)  available  from  all  the  at¬ 
tached  channels.  This  eliminates  the  blocking  reader 
problem  mentioned  above. 

It  Is  possible  to  bundle  the  multiplexed  stream  associated 
with  a  group  Into  another  bundle*  or  super-bundle.  This  al¬ 
lows  tree-structured  data  path  networks  to  be  built  up.  The 
maximum  tree  height  and  fan-in  at  each  group  Is  fixed  at  A 
and  16  respectively.  By  numbering  the  channels  bundled  Into 
a  group*  a  unique  name  for  every  possible  tree  node  Is 
defined  as  the  pathname*  or  sequence  of  channel  numbers 
encountered  along  a  path  from  the  "top*"  or  root*  of  the 
tree  to  any  particular  node.  The  pathname  or  sequence  num¬ 
bering  of  a  particular  node  Is  referred  to  as  an  Index.  (An 
Index  Is  represented  as  a  16-bit  quantity  Interpreted  as  a 
sequence  of  4-blt  numbers.)  All  exchanges  between  the 
operating  system  and  a  process  owning  channels  and  groups 
are  carried  out  using  Indices. 

Multiplexed  channels  are  created  using  the  following  C  code: 
fd  =  mpx  ( "name" *mode )  * 

wh^ch  has  the  same  effect  as  creat  ( "name " »mode >  In  that 
"name"  Is  placed  In  the  file  system.  In  addition  reads  and 
writes  on  "fd"  are  translated  by  the  operating  system  Into 
I/O  operations  on  channels  attached  to  the  group. 
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I/O  operations  on  a  group  are  carried  out  via  the  standard 
UNIX  timesharing  system  calls: 

cc  =  read  (fd»buf »count) ? 

c c  =  write  ( fd»buf ♦ count ) ? 

The  contents  of  "but"  are  a  concatenation  of  some  number  of 
variable-length  structures  each  having  the  form  of  an  Index 
followed  by  a  byte  count  followed  by  the  Indicated  number  of 
data  bytes.  (Control  channel  data  Is  distinguished  from 
data  channel  data  by  an  escape  convention  based  on  the  mes¬ 
sage  byte  count.  If  the  count  Indicates  a  zero-length  mes¬ 
sage*  then  the  actual  byte  count  follows  the  zero  and  Is  In 
turn  followed  by  control  channel  data.)  The  "buf"  formats 
for  reading  and  writing  are  Identical*  and  In  both  cases 
"cc"  Indicates  the  number  of  bytes  actually  transferred  out 
of  a  total  request  of  "count"  bytes.  (Another  form  of  write 
Is  provided  In  which  "buf"  consists  of  Indices*  byte  counts* 
and  pointers  to  the  actual  data.  This  format  reduces  the 
buffer  filling  overhead  on  output  and  Improves  the  per¬ 
formance  of  certain  programs.)  On  write  operations  If  "cc" 
<  "count"  and  the  contents  of  "buf"  were  destined  tor  more 
than  one  channel*  then  It  Is  known  that  at  least  one  channel 
flfo  threshold  was  exceeded  or  some  error  condition  was 
encountered.  Precise  Information  can  be  obtained  by  reading 
the  group  because  the  system  ^Immediately  passes  back  status 
Information.  The  Index  numbers  of  blocked  channels  and  the 
number  of  data*  one  »essage  for  each  blocked  data  channel. 
Uhen  the  previously  written  data  Is  finally  consumed* 
another  control  message  Is  sent  to  the  group  owner  Indicat¬ 
ing  the  readiness  of  a  channel  to  accept  data.  These  "bloc¬ 
king"  and  "unblocking"  messages  allow  a  process  to  continue 
to  serve  channels  even  though  It  temporarily  cannot  transmit 
to  all  its  channels.  A  complementary  function  Is  provided 
whereby  a  process  can  enable  or  disable  Incoming  data  trans¬ 
fers  on  selected  channels. 

If  "d"  Is  a  character  device  file  descriptor  obtained  via  a 
call  resembling 

d  =  open  ( "/dev/name"»2  )  * 

then  a  channel  can  be  created  and  the  character  device  at¬ 
tached  to  the  channel  by  executing 

cb  =  Join  (d*xfd)« 

where  "xfd"  Is  the  file  descriptor  for  the  multiplexed  chan¬ 
nel  and  "ch"  Is  the  new  channel  number. 
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Multiplexed  channels  may  be  joined  or  "bundled"  to  other 
channels  by  using  the  join  primitive  as  outlined  above  and 
letting  "d"  be  the  file  descriptor  of  a  multiplexed  channel. 
There  are  additional  primitives  for  "unbundling"  and 
manufacturing  file  descriptors  that  map  Into  channels. 
Moreover  the  non-mul t 1 plexed  file  descriptors  for  channels 
may  be  used  as  the  standard  Input  or  output  for  any  UNIX 
program.  (The  multiplexed  file  dexcrlptors  provide  direct 
access  to  the  control  paths  of  channels*  but  this  not 
meaningful  for  the  non-mul t Iplexed  case.  Currently*  loctl 
commands  on  the  non-mult  1 plexed  end  of  a  channel  are  treated 
as  messages  on  the  control  path  of  the  channel.)  The 
preceding  discussion  Indicates  how  channels  and  devices  can 
be  attached  to  groups.  It  remains  to  Indicate  how  channels 
are  attached  to  processes.  There  are  two  technlgues.  One 
Involves  using  the  extract  primitive*  which  Is  a  converse  of 
the  join  operation*  to  manufacture  a  file  descriptor  from  a 
channel.  Using  standard  techniques  found*  for  example*  In 
the  UNIX  shell  one  arranges  fro  an  extracted  file  descriptor 
to  be  the  standard  Input  and  output  for  a  new  process  by 
executing  UNIX  close  and  dup  calls  usually  followed  by 
fork/exec.  The  second  method  has  more  Interesting 
properties  -  If  "name"  Is  the  name  of  a  group*  then 

fd  =  open  <"name"»2>* 

triggers  the  following  sequence  of  events: 

1)  the  kernel  notices  that  an  open  Is  being  done 
on  a  group  rather  than  an  ordinary  file. 

2)  If  a  new  channel  cannot  be  joined  to  the 
group  or  if  the  process  which  created  the 
group  is  no  l onger r unn 1 ng *  the  open  falls  Im¬ 
mediately. 

3)  otherwise*  a  message  Is  sent  on  the  control 
channel  of  the  group  to  the  owner  process 
stating  that  an  open  was  requested.  The 
effective  UIO  of  the  opening  process  as  well 
as  the  Index*  x*  of  a  new  channel  are 
Included  In  the  message. 

4)  the  owner  process  may  respond  with  either  at- 
tach(x)  or  detech(x)  which  respectively  com¬ 
plete  the  job  of  hooking  channel  x  between 
the  group  and  returning  file  descriptor  fd* 
or  cause  the  open  to  fall. 


An  open  sequence  as  described  above  results  In  the  creation 
of  a  channel.  The  file  descriptor  returned  to  thr  process 
executln  ght  open  will  be  "plugged-ln"  to  one  end  of  the 
channel*  and  the  other  end  of  the  channel  will  be  attached 
to  the  group.  A  read  on  the  file  descriptor  will  be  satis¬ 
fied  by  writing  on  the  channel  through  the  group*  and  con¬ 
versely  for  writing  on  the  file  descriptor  and  reading  the 
group.  An  Immediate  application  of  this  facility  Is  In  1m- 
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plementlng  virtual  terminals*  or  a  "telnet  server"  as  It  Is 
called  by  the  Arpanet  community*  A  process  first 
establishes  a  group  and  arranges  for  one  channel  to  be  a 
data  path  to  a  similar  process  runlng  on  another  computer. 
If  the  remote  process  sends  a  message  asking  that  an 
Interactive  environment  be  established*  then  the  local 
process  forks*  opens  Its  own  group*  and  starts  up  the  shell 
with  the  file  descriptor  returned  from  the  open  as  the  stan¬ 
dard  Input  and  output.  Meanwhile  the  original  local  process 
arranges  to  copy  data  from  the  newly  created  channel  to  the 
remote  computer  and  vice  versa.  Of  course  there  are  certain 
niceties  Involving  access  permission*  process  groups*  and 
other  details  which  are  not  explained  here*  but  they  can  all 
be  handled  neatly  within  the  channel/group  organization. 

The  method  outlined  above  provides  a  form  of  "port" 
facility.  Its  main  disadvantage  Is  that  one  must  know  a 
port  name.  System  or  network-wide  services  would  presumably 
have  well-known  names*  but  It  Is  Important  to  have  a  class 
of  unbound  names  that  the  system  can  recognize.  Interpreta¬ 
tion  of  such  names  might  require  searching  for  a  remote 
machine  having  a  certain  service  facility  or  might  require  a 
simple  translation  of  some  sort.  In  order  to  accomplish 
this  a  mechanism  has  been  established  whereby  a  multiplexed 
channel  may  be  designated  as  the  unique  Interpreter  for  all 
such  unbound  port  names.  In  the  operating  system  any  open 
requests  on  names  containing  "!"  are  treated  as  open 
requests  on  the  special  channel.  One  use  of  this  mechanism 
Is  to  treat  "namel!name2"  as  a  request  for  a  file  with  name 
name2  on  a  machine  designated  by  namel.  Since  strinqs  of 
this  form  may  be  passed  In  to  any  program  on  the  system*  one 
may  write 


dlff  mac h 1 ne 1 ! f 1  le 1  ma c h 1 ne2  !  f i  le 2 

and  exoect  the  UNIX  dlff  command  to  be  run  with  Input  from 
machlnel  and  mach1ne2. 

For  some  applications  the  bandwidth  that  can  be  achieved  by 
Implementing  data  stream  switching  between  channels  In  a 
user  process*  Implying  a  copy  operation  from  the  kernel  to 
the  switch  process  and  back  to  the  kernel  and  then  a  final 
copy  to  the  destination  process  or  device*  may  be  quite 
adequate.  The  primary  example  Is  the  virtual  terminal 
scheme  outlined  above.  However  this  Is  not  true  for  many 
other  applications  especially  those  Involving  file  transfer 
or  file  access.  For  these  cases  a  connect  primitive  Is  sup¬ 
plied  which  establishes  a  "short-circuit"  connection  In  the 
kernel  between  a  channel  and  file  descriptor.  That  Is*  at 
the  place  In  the  operating  system  where  data  buffered  In  a 
channel  would  be  copied  to  a  user  process  as  part  of  a  read 
operation*  the  data  Is  handled  as  thouqh  a  write  on  the  file 
descriptor  had  been  done.  The  connect  primitive  specifies 
whether  the  symmetric  short-circuit  path  Is  also  meant  to  be 
established  -  that  Is*  whether  writes  on  the  file  descriptor 
should  Induce  a  direct  copy  to  the  agent  reading  the  "other" 
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end  of  a  channel*  A  disconnect  operation  Is  also  provided 
to  break  open  short  circuits* 

The  semantics  of  carrying  out  a  normal  open  call  on  a  mul¬ 
tiplexed  channel  name  provide  a  useful  range  of  Interprocess 
communication  capabilities.  This  Is  what  one  expects  from  a 
process  communication  system.  However*  by  making  slight  ad¬ 
justments  to  the  name  recognition  algorithms  In  the  system  a 
wider  class  of  file  names  can  be  "trapped"  by  the  open 
routines  In  the  kernel  and  passed  as  messages  to  a  program 
for  further  Interpretation*  This  comprises  a  very  powerful 
mechanism  for  distributing  system  functions  in  Interesting 
and  useful  ways:  once  a  channel  has  been  established  via 
this  name  translation  procedure*  subsequent  I/O  on  the  chan¬ 
nel  by  the  process  can  be  redirected  to  other  computers  or 
other  process  at  will  and  without  modification  to  the 
Initiating  program. 


* 
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7.6  musim  i££  Mil  SIMMUM 


DISTRIBUTED  INTERPROCESS  COMMUNICATION  AND  SIGNALLING 


by 

G.  Le  Lann 
IRI A/SIRIUS 


7.8.1  XtL£  £a0fil.al  Content 

Let  us  consider  a  system  Including  several  processors  being 
linked  together  through  an  Interconnection  structure.  Ue 
will  distinguish  between  processors  being  accessed  by  exter¬ 
nal  users  who  wish  to  Initiate  activities  and  processors 
which  run  these  activities  and  may  return  results  to  some 
external  users.  Initiation  of  activities*  execution  control 
and  transmission  of  data  are  accomplished  through  transmis¬ 
sion  of  messages.  In  the  following*  we  will  refer  to  these 
processors  respectively  as  senders  and  receivers  of  messages 
(see  figure  1).  We  will  not  make  any  assumption  regarding 
the  size  of  these  messages. 
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Our  assumptions  will  be: 

-  senders  and  receivers  may  be  micro*  mini  or 
maxi  processors* 

-  these  processors  may  fall* 

-  the  interconnection  structure  Is  any  resilient 

hardware  structure  (using  alternate  routes  In 
telecommunication  networks*  multiple 

busses/cables  In  mul 1 1  processor s /mul 1 1  computer s ♦ 
radio  frequencies*  etc*)* 

-  errors*  duplicates  and  losses  are  possible  dur¬ 
ing  the  transmission  of  messages* 

-  message  transit  delays  are  variable* 

-  there  Is  no  privileged  processor  In  charge  of 
handling  either  communication  or  Interprocessor 
cooperation. 


Ue  would  like  first  to  describe  some  of  the  problems  we  see 
to  exist  in  such  systems  and*  second*  to  present  a  solution. 
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7.8.2  Ib£  ECflfaUll 


7.8.2.1  Multiple  Sender/Single  Receiver  Systems 

Let  us  consider  a  system  as  depicted  In  figure  1  but  Includ¬ 
ing  only  one  receiver.  Ue  can  Identify  two  different 
problems : 

I)  for  any  sender*  It  may  be  necessary  to 
maintain  a  strict  sequencing  of  messages  be¬ 
ing  sent  to  the  receiver 

II)  the  various  message  flows  converging  at  the 
receiver  may  have  to  be  serviced  by  the 
receiver  according  to  a  particular 
discipline*  which  may  be  dynamically  changed 
and  not  be  known  statically  or  guessed  by  the 
receiver. 


Problem  (1)  Is  a  problem  of  end-to-end  signalling  or  single¬ 
path  signalling  (sps).  Solutions  to  the  sps  problem  are 
well  known.  The  "window"  technique  Is  an  example  of  such  a 
solut Ion. 

Problem  (11)  raises  the  Issue  of  multiple-path  signalling 
(mps)  that  Is  the  problem  of  serializing  Incoming  messages 
Issued  In  parallel  by  different  asynchronous  sources.  A 
mechanism  Is  needed  whereby  senders  may  enforce  distantly  a 
particular  serialization  of  messages  at  any  time.  For  exam¬ 
ple*  this  Is  needed  when  two  senders  A  and  P  wish  to 
establish  a  particular  ordering  for  Initiating  activities 
(e.g.*  A  before  B). 


7.Q.2.2  Multiple  Sender/Multiple  Receiver  Systems 

Let  us  now  consider  a  system  Including  several  receivers. 
Ue  will  distinguish  between  two  cases! 

1>  Eyii*  ndyQdani  line  mi 

Major  motivations  for  running  several 
Identical  receivers  are  to  make  the  system 
able  to  survive  receiver  failures*  to  provide 
for  a  geographically  dispersed  but  unique  ac¬ 
tivity  visible  from  various  locations 
(receiver  areas)*  or  to  relax  constraints 
regarding  system  maintenance. 

The  serialization  of  Incoming  messages 
(either  fortuitous  or  enforced)  must  be 
unique  for  all  receivers.  This  Is  an  mps 
problem. 
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Piiiiaiiz  liiyniani  ixiisai* 

These  systems  Include  several  receivers  run- 
Ing  activities  which  may  be  strictly 
Identical  for  some  of  the  receivers*  as  well 
as  activities  which  are  different  for  all 
receivers . 


In  addition  to  the  motivations  already  mentioned*  other 
reasons  for  considering  such  systems  are  to  provide  for 
various  activities  being  run  In  parallel  and  to  allow  for  a 
modular  and  dynamic  growth  of  the  system.  In  these  systems* 
an  activity  being  Initiated  by  a  sender  may  span  several 
receivers*  This  raises  the  need  for  coordinating  the 
various  Individual  serialization  processes  over  these 
receivers*  Finally*  according  to  user  requests*  the  mapping 
between  senders  and  receivers*  1*e*  the  need  to  set  and 
reset  cooperation  paths  between  senders  and  receivers  will 
be  constantly  changing  with  time* 

To  summarize*  we  want  to  maintain  a  unique  serialization  of 
Incoming  messages  for  those  receivers  which  act  as  "twins*" 
In  addition  to  this*  we  want  to  be  able  to  achieve: 

-  For  every  receiver*  a  specific  and  local 
serialization  of  messages  In  step  with  the 
dynamically  changing  subset  of  senders  It  Is 
cooperating  with 

-  decentralized  coordination  between  those 
receivers  which  have  to  serialize  messages 
related  to  mu l t 1 - rece 1 ve r  activities  In  order  to 
avoid  conflicts  between  such  activities* 


This  Is  again  an  mps  problem* 


7.8*3  LaflilQfl  l&c  i  Sjelulian  :  &taul£titnit 

Potential  advantages  of  distributed  computing  systems  are 
numerous.  However*  It  Is  not  so  simple  to  find  a  solution 
to  a  particular  design  problem  which  does  not  annihilate 
some  of  these  advantages*  A  number  of  requirements  which 
are  considered  to  be  of  primary  Importance  for  a 
"distributed  solution"  to  the  mps  problem  are  listed  below* 


Georgia  Institute  of  Technology 


I  PC  Workshop 


Section  7 


CURRENT  TECHNIQUES  AND  EXPERIENCE 


Page  117 


7*8*3*1  Parallelism  and  Response  Tine 

A  solution  should  take  full  advantage  of  the  parallel  nature 
of  the  system;  parallelism  In  processing  as  well  as  In  com¬ 
munication  may  result  In  a  good  resource  utilization  ration* 
This  has  a  non-negllglble  Impact  on  system  costs  and 
response  time* 


7*8*3. 2  Resiliency 

A  solution  should  survive  failures*  Actually*  we  need  a 
more  precise  measurement  of  such  a  property  which  would  ex¬ 
press  the  number  of  simultaneous  failures  a  solution  may 
survive*  This  Is  the  notion  of  resiliency. 


7*8*3*3  Overhead 

Costs  of  a  solution  may  be  low*  monstrous*  or  acceptable* 
It  Is  necessary  to  evaluate  overheads  as  regards  traffic 
(number  and  size  of  additional  messages)*  processing  (handl¬ 
ing  of  additional  messages)  and  storage  (for  "control"  1n- 
f ormat Ion) . 


7. 8. 3. 4  Permanent  Rejection 

When  conflicts  occur  (between  "simultaneous"  activities*  for 
example)*  how  does  a  solution  lend  Itself  naturally  to  avoid 
Infinite  waiting*  without  resorting  to  any  exotic  or  ad-hoc 
mechanl sm? 


7*8. 3. 5  Fairness 

Again*  when  conflicts  occur*  a  solution  should  not  favor 
systematically  the  same  proces sor ( s > . 


7* 8*3*6  Extensibility 

If  a  solution  may  keep  on  working  under  dynamic  system 
reduction  (failures)*  then  It  Is  necessary  to  show  how  this 
solution  matches  the  requirement  of  dynamic  system  ex¬ 
tension*  What  this  means  Is  that  It  should  he  possible  to 
reinsert  or  to  add  processors  to  the  system  without  disrupt¬ 
ing  the  functioning  of  the  system. 
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7. 8. 3.7  Simplicity 

When  time  has  come  to  Implement  a  system*  problems  of  under¬ 
standing*  specifying*  debugging  and  maintaining  the  software 
corresponding  to  a  particular  solution  become  preponderant* 
This  last  requirement  may  well  be  one  to  look  at  very 
carefully  when  considering  to  build  a  real  system* 


7.8.4  a  salMiiao 

ye  have  seen  that  an  mps  mechanism  Is  needed  If  one  wishes 
communications  between  several  senders  and  receivers  to  ex¬ 
hibit  some  specific  properties*  Obviously*  signalling  In  a 
distributed  system  will  be  accomplished  through  the  exchange 
of  messages*  1*e*  signalling  will  rely  on  communication* 

This  apparently  recursive  problem  requires  some  structuring* 
We  will  then  assume  that  any  convenient  technique  Is  used  In 
the  system  for  solving  the  sps  problem. 

On  top  of  this  "layer*"  we  will  build  our  mps  mechanism. 


7.8.4. l  A  Virtual  Ring  Structure 

Sending  processors  are  given  permanent  Identities.  If  n  Is 
the  predicted  maximum  number  of  these  processors*  Identities 
will  be  Integers  belonging  to  the  Interval  CO*  n  -  IT*  As  a 
result*  It  Is  possible  to  view  these  processors  as  being 
sequenc 1  ally  located  along  a  virtual  ring.  Each  processor  1 
has  a  well  known  predecessor  and  a  well  known  successor*  1  - 
1  and  1  ♦  1  In  the  absence  of  failure  tthe  marks  -  and  ♦ 
stand  for  operations  modulo  n>.  There  Is  no  assumption  made 
regarding  the  mapping  of  processor  Identities  on  physical 
addresses.  In  other  words  a  virtual  ring  strructure  does 
not  assume  any  particular  physical  topology. 

As  processors  are  located  on  a  virtual  rlnq*  It  Is  only 
needed  for  each  of  them  to  know  the  Identity  of  their 
respective  predecessor  (pred)  and  successor  (sue). 

A  permanent  and  virtual  communication  path  Is  established 
between  adjacent  processors.  A  message  sent  on  such  a  path 
may  travel  over  different  physical  links  as  provided  by  the 
Interconnection  structure.  Specific  techniques  may  keep  the 
failure  of  a  particular  link  transparent  to  processors. 
However*  occurrence  of  one  or  several  faltures  may  preclude 
communication  between  adjacent  processors.  Cetectlon  of  a 
communication  path  breakdown  as  well  as  detection  of  a 
processor  failure  can  be  achieved  by  uslno  one  of  the  fol¬ 
lowing  techniques. 
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7.8. 4.1.1  Mut ua L  Suspicion 

Every  processor  sends  regularly  "life  messages"  to  Its  suc¬ 
cessor  on  the  ring*  These  messages  should  be  acknowledged* 
If  the  successor  falls  to  return  acknowledgements  for  a 
given  period  of  time*  It  Is  declared  dead  and  Its  predeces¬ 
sor  undertakes  a  ring  reconfiguration*  Actually*  there  Is 
no  difference  between  an  abnormal  behaviour  of  a  successor 
and  a  breakdown  of  a  communication  path*  In  both  cases*  the 
successor  should  not  be  maintained  on  the  ring* 

Acknowledgement  of  life  messages  Is  bound  to  some  Internal 
checking  procedure  which*  If  successful*  indicates  that  the 
processor  Is  safe*  In  order  to  achieve  correctness  checking 
transitivity  along  the  ring*  It  Is  necessary  to  bind  the 
transmission  of  life  messages  to  this  checking  procedure  as 
well* 

Consequently*  a  processor  cannot  be  returning  ack¬ 
nowledgements  to  Its  predecessor  and  fall  In  checking  Its 
successor* 

7. 8. 4. 1.2  Explicit  Message  Acknowledgement 

It  may  be  required  for  messages  sent  over  a  communication 
path  to  be  acknowledged.  A  number  of  retransmissions  are 
allowed  before  deciding  that  the  communication  path  Is 
broken.  Numerous  examples  of  protocols  aimed  at  monitoring 
transmission  on  various  transmission  media  can  be  found  In 
the  literature.  They  will  not  be  detailed  here.  Also*  It 
may  happen  that  messages  are  not  acknowledged  because  the 
successor  has  failed.  As  explained  before*  whatever  the 
case*  that  successor  should  not  be  kept  on  the  ring  any 
longer. 

Thus*  every  processor  on  the  ring  must  be  provided  with  a 
reconf 1 gurat Ion  protocol  to  be  used  every  time  a  failure 
leads  to  a  ring  breakdown.  A  simple  example  of  such  a 
protocol  Is  given  below. 


7.8.4. 2  Ring  Reconfiguration 

Let  us  consider  a  situation  where  processor  1  and  processor 
1*2  are  respectively  predecessor  and  successor  of  processor 
1*1  when  this  processor  falls  or  when  the  communication  path 
between  1  and  1*1  Is  broken.  It  Is  only  necessary  for 
processor  1  to  send  to  1*2  a  specific  message*  to  be 
referred  to  as  a  reconfiguration  message*  meaning  that  from 
now  on  predecessor  or  processor  1*2  Is  processor  1.  This 
message  must  be  acknowledged  by  1*2.  If  an  acknowledgement 
Is  not  received  by  1  after  several  attempts*  1  will  send  a 
reconfiguration  message  to  1*3*  thus  excluding  1*2  from  the 
ring.  The  extreme  situation  Is  that  of  a  ring  Including 
only  one  processor. 
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The  decision  of  Initiating  a  reconfiguration  belnq  taken  ex¬ 
clusively  by  one  processor  for  any  particular  failure*  It  Is 
easy  to  Infer  that  no  Incoherence  can  arise  because  of  the 
exclusion  of  a  processor  from  the  ring.  Because  It  Is 
required  for  a  reconfiguration  message  to  be  acknowledged* 
It  Is  possible  to  devise  some  more  elaborate  scheme  (for 
Instance*  utilizing  passwords)  to  avoid  the  possibility  of 
having  a  single  faulty  processor  excluding  all  the  others 
from  the  ring*  An  example  of  a  protocol  using  passwords  Is 
given  below. 


7. 8. A. 3  The  Extensibility  Property 

If  processors  are  allowed  either  to  fall  or  to  leave*  It 
should  be  possible  to  reinsert  on  the  ring  a  processor  which 
has  been  repaired  or  which  decides  that  It  Is  "on"  again. 
Also*  we  want  It  possible  to  expand  the  system  while  the 
system  Is  running.  To  this  end*  a  three-party  protocol  Is 
needed  such  that  the  ring  Is  always  correctly  configurated. 
This  protocol  must  survive  failures  Itself  and  should  entail 
as  small  a  disturbance  as  possible.  Let  us  assume  that 
orocessor  1  has  to  be  Inserted  on  the  ring. 

To  this  end*  j  must  send  a  specific  message*  called  an 
"Insert"  message*  containing  Its  Identity  j  to  Its  potential 
successor  <j*l»  J+2*  Let  us  assume  that  k  Is  on  the 
rlnq.  Processor  k  knows  the  Identity  of  Its  current 
predecessor.  Let  us  assume  that  pred  tk]  Is  processor  1. 

Upon  receiving  such  a  message*  k  checks  that  the  following 
cond 1 tlon  holds: 

pred  Ck3  <  Identity  within  Insert  message  <  k 
(<  Is  modulo  n). 

If  It  Is  so*  k  checks  for  an  exchange  of  m  life  messages 
with  j  and  then  sends  to  1  a  message  meaning  that  1  should 
accept  5  as  Its  new  successor.  This  message  contains  a  pas¬ 
sword  X.  Upon  reception  of  this  request*  1  checks  for  an 
exchange  of  m  life  messages  with  J.  When  this  Is  completed* 
1  sends  to  k  a  "switch"  message  containing  the  password  X. 
This  message  Is  Intended  to  avoid  processors  1  and  k  being 
fooled  by  a  malicious  processor  j  and  It  Is  also  used  as  a 
means  to  perform  safely  message  transmission  switching  on 
the  new  path  (1*  j*  k)  as  explained  below. 

Upon  receiving  the  "switch*  message*  k  acknowledges  It  and 
listens  to  j  to  detect  the  reception  of  a  message  containing 
code  X. 
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Upon  receiving  this  acknowledgement*  1  performs  the  update 
sue  (1)  :=  jt  the  first  message  to  be  sent  to  )  Is  a  message 
Including  code  X.  This  message  and  other  subsequent  mes¬ 
sages  are  passed  on  to  k  by  j. 

When  receiving  a  message  with  code  X*  k  updates  pred  C k  3 
with  value  J  and  then  stops  listening  to  1. 

There  Is  no  Interruption  of  message  transmission  on  the 
ring#  If  something  goes  wrong  with  j  no  disturbance  Is 
Introduced  on  the  existing  ring*  The  message  containing 
code  X  Is  a  good  vehicle  to  maintain  a  FIFO  message  trans¬ 
mission  on  the  ring  should  this  be  required.  There  Is  no 
special  provision  made  to  guarantee  that  loss  of  messages 
does  not  occur  between  1  and  k  just  before  or  after  recon¬ 
figuration  of  the  ring  performed  by  k*  Loss  of  control  mes¬ 
sages  Is  accepted  on  the  ring  and  Is  harmless  as  will  be 
shown  later* 

If  transmission  between  1  and  j  or  between  j  and  k  turns  out 
to  be  Impossible*  then  a  normal  ring  reconfiguration  Is  un¬ 
dertaken. 


7. 8. A. 4  The  Control  Token  Mechanism 

Cooperation  between  processors  located  on  a  virtual  ring  can 
be  achieved  by  providing  them  with  some  control  privilege. 
The  solution  suggested  here  Is  to  have  a  particular  message* 
called  the  control  token*  circulating  on  the  ring.  Only 
when  holding  the  token  should  a  processor  be  allowed  to 
Initiate  some  specific  activity.  Upon  completion*  the  token 
Is  sent  to  the  successor.  Obviously*  In  the  case  the  token 
Is  lost*  It  should  be  possible  to  regenerate  It. 

We  begin  by  describing  how  the  control  token  mechanism  Is 
made  resilient.  Then*  we  show  how  this  mechanism  can  be 
used  to  solve  the  mps  problem. 

7. 8.4. 4.1  Resiliency 

We  assume  that  every  processor  owns  a  timer  and  that  timer 
values  being  used  by  the  various  processors  on  the  ring  are 
not  necessarily  Identical.  Processors  are  allowed  to  read 
headers  of  messages  circulating  on  the  ring. 

Transmission  of  a  token  between  adjacent  processors  Is 
monitored  through  a  positive  acknowledgement  *  retransmis¬ 
sion  protocol.  The  token  carries  with  It  an  Integer  valuei 
called  the  cycle  number*  which  Is  Incremented  for  every  com¬ 
plete  revolution  on  the  ring.  This  Incrementation  Is  per¬ 
formed  by  processor  x  such  that  x  >  sue  (x).  At  any  time* 
this  processor  Is  unlaue.  Also*  the  numbering  cycle  to  be 
used  should  be  chosen  so  that  duplicate  detection  can  be 
performed  safely.  This  Is  possible  If  maximum  "hardware" 
transit  delays  are  known. 
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Timer  values  being  used  by  processors  correspond  to  the  ex¬ 
pected  round-trip  time  with  the  successor  on  the  ring.  A 
timer  is  reset  when  the  token  has  been  acknowledged  by  the 
successor. 

Each  processor  keeps  a  recording  of  the  value  (N)  carried 
within  the  token  during  its  last  visit.  Next  real  token  to 
be  received  (not  duplicates)  must  carry  value  N  ♦  1.  When 
the  sender's  timer  awakes*  transmission  Is  tried  again*  up 
to  a  maximum  number  of  attempts.  Should  this  limit  be 
reached*  a  ring  reconfiguration  Is  undertaken.  The  token  is 
not  lost. 

If  failure  of  a  processor  is  noticed  through  the  mutual 
suspicion  protocol*  then  it  may  be  the  case  that  the  token 
was  held  by  this  processor  which  failed.  Detection  of  such 
a  situation  and  regeneration  of  the  token  can  be  performed 
as  follows. 

Let  h  be  the  Identity  of  the  predecessor  of  that  processor 
which  has  failed  and  1  the  Identity  of  the  successor. 
Processor  h  undertakes  a  ring  reconfiguration.  The  recon¬ 
figuration  messaqe  carries  with  It  value  N(h>*  last  token 
value  known  in  h.  Upon  reception  of  this  message*  processor 
1  runs  the  following  algorithm: 

If  (1  >  h  and  Nth)  *  NCI))  or 
(i  <  h  and  NCh)  =  N ( 1 ) )  then 

create  token  N(I>  :=  N(1)  ♦  1* 

Ulth  such  an  algorithm*  It  is  possible  to  assert  that  a 
token  is  never  lost  and  that*  at  any  time*  there  is  only  one 
such  token  circulating  on  the  ring  (or  zero  for  a  finite  and 
hopefully  short  period  of  time). 

7. s. 4. 4. 2  fliiinJLiyisd  sianaiilas 

A  simple  way  to  achieve  a  specific  signalling  sequence  In  a 
distributed  system  is  to  have  the  processors  serializing 
themselves  so  that  at  any  time*  only  one  processor  Is  "ac¬ 
ting."  This  can  be  done  very  simply  by  using  the  control 
token  as  a  vehicle  to  achieve  mutual  exclusion  between  these 
processors.  However*  the  speed  of  this  signalling  technique 
is  very  much  dependant  on  the  time  spent  within  the  critical 
section.  The  problem  Is  that  very  often*  both  the  number 
and  the  nature  of  mutually  exclusive  actions  are  given 
beforehand  and  It  may  be  very  difficult  to  adjust  the  size 
of  the  critical  section  so  that  response  time  requirements 
are  matched.  Such  a  technique  could  slow  down  a  system 
art  1 f 1 ca lly. 

Instead  of  this*  it  Is  suggested  to  uncouple  completely  the 
signalling  mechanism  and  the  execution  of  the  critical  sec¬ 
tion.  As  a  result*  mutually  exclusive  actions  will  be 
initiated  in  parallel.  A  proper  sequencing  can  be  built  by 
assigning  identifiers  to  them.  The  control  token  will  be 
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used  for  the  purpose  of  distributing  sequenclal  Identifiers 
within  the  system.  These  sequential  Identifiers  will  be 
referred  to  as  tickets.  Every  message  Issued  by  a  sender 
must  be  ticketed. 

If  we  want  receivers  to  service  Incoming  messages  according 
to  a  purely  sequential  ordering*  then  we  need  one  ticket 
space  per  receiver  category.  In  a  fully  redundant  system* 
we  have  only  one  category  of  Identical  receivers.  One  tic¬ 
ket  space  Is  needed.  In  a  partitioned  or  partially  redun¬ 
dant  system*  we  need  one  ticket  space  for  each  partition. 
Then*  according  to  the  system  under  consideration*  the  token 
will  carry  either  a  ticket  value  or  an  array  of  ticket 
values. 

It  has  been  shown  how  the  birtual  rlnq  ♦  token  structure  can 
survive  failures.  But  tlc^t  allocation  must  also  be 
resilient.  To  this  end*  one  m  /  require  that  a  processor 
should  be  either  selecting  tickets  or  using  them  but  not 
both.  What  this  means  Is  that  those  tickets  which  are 
selected  by  a  processor  should  not  be  used  until  the  token 
has  been  acknowledged  by  the  successor.  As  a  consequence* 
should  a  failure  occur  in  the  midst  of  ticket  selection*  the 
correct  ticket  value  or  array  of  ticket  values  can  be 
regenerated  with  the  token  exactly  like  this  Is  done  for  the 
cycle  number  (see  7. 8. 4. 4.1) .  Another  Issue  Is  that  of 
failures  Interrupting  processing  at  random.  In  particular* 
what  should  be  done  with  those  messages  which  have  been  Is¬ 
sued  by  a  processor  which  failed  later  on?  Another  problem 
Is  what  to  do  with  tickets  not  being  used  because  they  were 
held  by  a  processor  which  died. 

Actually*  the  whole  Issue  would  require  a  complete  discus¬ 
sion  which  Is  out  of  the  scope  of  this  paper. 

7. 8. 4. 4. 2.1  Fortuitous  Serialization 

i>  iianaiilDii  wJLiiiln  fun*  isdyndaQi  ir&isis 

The  Broadcasting  of  a  ticketed  message  to  all  receivers  may 
be  done  by  the  sender  (parallel  br oadcas 1 1  no ) .  The  usual 
problem  with  this  technique  Is  that  the  sender  may  fail 
while  Issuing  messages.  However*  because  tickets  must  be 
sequential*  It  Is  simple  for  a  receiver  to  detect  such  an 
unsafe  situation.  A  copy  of  the  missing  message  may  be  ob¬ 
tained  from  another  receiver. 

Another  approach  to  broadcasting  consists  In  organizing 
receivers  along  a  virtual  ring.  This  ring  Is  Intended  to  be 
a  resilient  vehicle  for  message  broadcasting.  Only  one  copy 
of  a  message  must  be  created  by  the  sender  which  hands  It 
over  to  one  of  the  receivers.  This  receiver  Is  then  In 
charge  of  Initiating  the  revolution  of  the  message  on  the 
ring. 
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1 1 )  Signalling  within  partloned  £r  partially  redundant 

££ll£2il 

The  transmission  of  ticketed  messages  Is  done  by  the  sender 
which  selects  tickets  from  the  ticket  spaces  c o r re spodn 1 ng 
to  the  relevant  partitions. 

7. 8. 4. 4. 2. 2  Enforced  Serialization 

Let  us  assume  that  two  senders  A  and  B  want  the  receivers  to 
process  messages  Issued  by  A  first  and  then  messages  Issued 
by  8.  This  Is  done  very  simply  by  having  A  sending  to  B  a 
"go-ahead"  message  after  A  has  ticketed  Its  last  message. 
There  Is  no  need  for  serializing  the  related  activities 
outside  the  system  (for  example*  A  waits  until  Its  activity 
Is  over  and  then  sends  a  message  to  B>. 

Senders  A  and  B  may  also  wish  to  Initiate  co-related  ac¬ 
tivities  which*  In  a  partitioned  system*  share  at  least  one 
partition.  These  activities  are  such  that  the  message  from 
A  should  be  serviced  before  the  message  from  B  and  also  the 
message  from  B  should  not  be  processed  if  the  activity 
Initiated  by  A  could  not  be  completed. 

The  following  protocol  may  be  suggested.  In  the  "go-ahead" 
message*  A  stores  the  value  of  the  ticket  used  for  Its  mes¬ 
sage.  It  Is  then  only  needed  to  provide  for  a  flag  and  a 
field  in  message  headers  to  be  used  as  follows.  when  a  mes¬ 
sage  M  Is  received  with  the  flag  set*  the  receiver  should 
read  the  ticket  value  stored  In  the  field.  If  the 
corresponding  activity  could  not  be  completed*  message  M  is 
discarded  and  the  sender  Is  told  that  Its  activity  was  not 
Initiated. 

7. 8. 4. 4. 2.3  Performance  Considerations 

We  want  the  signalling  mechanism  not  to  put  any  artificial 
limitation  upon  the  system  per f ormances .  Consequently*  this 
mechanism  should  not  be  dependent  upon  the  rotating  time 
period  of  the  token  on  the  virtual  ring.  Senders  should  be 
able  to  ticket  and  to  issue  messages  at  any  time.  This 
means  that  senders  should  be  allowed  to  select  tickets  not 
only  for  pending  messages  but  also  for  "future"  messages* 
l.e.  messages  to  be  created  and  Issued  between  two  succes¬ 
sive  visits  of  the  token. 

Let  p  be  a  sender.  At  token  visit  M*  let  C.Kp)  be  the 
exact  number  of  messages  which  are  pending  when  the  control 
token  Is  received*  f.1(p)  be  the  predicted  number  of  future 
messages*  T . 1 <  p »  be  the  current  value  of  the  relevant  ticket 
space  upon  reception  of  the  token  and  Ttl(p)  be  the  new  tic¬ 
ket  value  when  the  token  Is  sent  on  the  ring. 

Sender  p  Is  allowed  to  acquire  C.1(p)  ♦  f . 1 ( p )  consecutive 
tickets*  startlnq  from  T.1(p>.  Ideally*  during  token 
revolution  W 1 ♦ 1 »  P  needs  exactly  f • 1 < p )  tickets.  Clearly* 
predictions  are  only  predictions.  Furthermore*  the  token 
circulating  speed  Is  variable.  Hence*  It  Is  necessary  to 


Georgia  Institute  of  Technology 


IPC  Workshop 


Section  7 


CURRENT  TECHNIQUES  ANO  EXPERIENCE 


Page  125 


consider  two  possible  situations: 

-  £  S.tia£t  al  1 1  ckets :  It  has  to  wait  for 

reception  of  the  token. 

-  some  t^1_£k e t_s  £££  not.  ysed  when  the  t,o_ken  Is 

back :  let  u.1(p>  be  the  number  of  unused  tic¬ 
kets.  Because  of  the  mutual  Independence 

principle*  these  tickets  should  be  used  up  Im¬ 
mediately.  For  that  purpose*  we  provide  for  the 
utilization  of  a  no-operation  code.  Exactly 
u  •  1  ( p  )  "fake"  messages  carrying  a  NOP  code  will 
be  Isued  by  p. 


When  needed*  and  as  long  as  tickets  are  available*  new  mes¬ 
sages  are  Issued. 

Probably*  this  will  achieve  a  good  parallelism  between  sen¬ 
ders  but  It  Is  not  clear  whether  or  not  this  will  result  In 
a  good  average  response  time.  Response  time  for  a  given 
sender  Is  dependent  on  how  fast  predecessors  use  up  their 
tickets. 

Should  such  an  Interference  be  Judged  unacceptable*  another 
solution  Is  needed. 

Uhat  we  would  like  to  build  Is  a  mechanism  whereby  current 
pending  messages  and  future  messages  are  distinguishable*  so 
that  current  pending  messages  for  any  sender  receive  tickets 
"smaller"  than  those  given  to  future  messages. 

Let  us  make  It  clear  that  we  do  not  attempt  to  ouild  a  per¬ 
fect  chronological  ordering  of  messages.  Ue  only  try  to 
achieve  some  system-wide  statistical  FIFO  service  so  that 
the  average  response  time  for  every  sender  can  be  kept  below 
a  reasonable  value. 

The  way  this  can  be  done  Is  rather  simple.  It  Is  only 
needed  to  maintain  two  ticket  values  T  and  8*  In  the  token 
Instead  of  one  (or  two  arrays  Instead  of  one).  T  as  above* 
Is  to  be  used  for  ticketing  current  pending  messages  and  8 
for  ticketing  future  messages.  By  the  time  the  token  is 
back  In  p*  only  one  of  the  three  following  conditions  can 
hold: 


-  u.1(p)  =  C.1<p)  =  0  (Ideal  case) 

-  C.1(p)  messages  are  waiting  because  p  Is  lacking 

tickets*  u.1(p)  =  0*  C.1(p)  >  0  (under¬ 

estimation) 

-  u.1(p)  tickets  are  still  available*  u.i(p)  >  0* 
C.1(p)  =  0  (over-estimation). 
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A  requirement  regarding  the  ticketing  function  Is  that  the 
two  sets  of  numbers  being  used  to  assign  a  value  to  T  and  8 
should  not  be  overlapping. 

Two  numbering  cycles  N<T)  and  N<8>  should  be  chosen  so  that 
tickets  lifetime  Is  convenlend  (see  computations  below). 

As  T-tlcketed  messages  and  8-tlcketed  messages  will  be 
received  Interleaved  by  receivers*  It  Is  necessary  to 
provide  for  some  means  whereby  receivers  are  able  to  decide 
when  to  stop  processing  T-tlcketed  messages  and  start 
processing  8-tlcketed  messages  as  well  as  the  reverse. 

Such  a  "switching"  should  correspond  to  a  complete  revolu¬ 
tion  of  the  token  on  the  virtual  ring.  We  need  a  sender  to 
flag  the  corresponding  T  and  8  ticket  values. 

That  sender  could  be  x  such  that  successor  (x)  <  x.  Due  to 
the  properties  of  the  virtual  ring*  this  processor  Is  unique 
and  always  exists. 

The  algorithm  to  be  followed  by  sender  p  upon  reception  of 
the  token  Is  described  below  (•*  and  -  operations  are  modulo 
N <  T )  or  N ( 8 ) ) . 


B££IN 

IF  sue  (p)  <  p  and  C.1(p)  =  0  THEN 

ataiJii 

c.i  cp)  :=  i; 

creat  Fake  message 

end; 

it  C.T(p)  >  0  T£0  T».1(p)  :=  T.1(p)  ♦  c.1(p> 
(acquisition  of  tickets  *T.1(p>*  ...*  #T.1(p)  *  C.1(p>  -  1) 
IE  u .  1  ( o )  >  0  IHEii 

send  u .  1  ( p )  Fake  messages  (ticketed  with  the  u.1(p) 
highest  8-tlckets  obtained  during  the  last 
token  visit)? 
assign  a  value  to  f.1(p>* 

IF  sue  (p)  <  o  AND  f.i(p)  =  0  TH£N 
BEGIN 

f .1 (p)  :=  i; 
create  Fake  message 
end; 

e*.T7p)  :=  e.i (p)  ♦  f . i ( p > 

(acquisition  of  tickets  «8.1(p)*  ...*  #8.1(p)  ♦  f.1(p)  -  1>; 
I F  sue  (p)  <  p  THEN  Flag  messages  carrying  tickets 
# T •  1  ( p )  *■  C.TTpT  -  1  and  #8.1(p)  ♦  f.1(p)  -  i; 

END 
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The  algorithm  to  be  followed  by  a  receiver  Is  given  below. 
Notations : 

X  stands  for  either  state  T  ("current")  or 
state  S  ("future"  )J 

X"  =  T  If  <  x  =  8) , 

:  8  If  <  X  =  T) 5 

t(X)  Is  a  local  variable  containing  the  ticket  value  of  the 
last  processed  message*  1,e.  t(T>  or  t(6). 


WHEN  IN  STATE  X  DO 

kfi<2£:  Scan  for*  or  wait  for  reception  of  message 

X-tlcketed  t(X)*i: 

CASE1  (X-tlcket  t»  >  t(x)+l  Is  received): 

mod 

Record  request* 

CASE2  (X“-t1cket  Is  received): 

Record  request* 

CASE3  (X-tlcket  t(x)*l  Is  present  or  received): 

BEGIN  Initiate  processing* 

-- 

message  t  <  X ) ♦ 1  Is  flagged 

jhen 

switch  to  state  X” 

LkSL 

t(x)  :=  t(X)-n 
CASEA  (timeout): 

Marks  Itself  out  of  synchronization  and  Initiate  a 
recovery  procedure. 


A  simple  way  to  provide  for  two  separate  numbering  schemes 
of  equal  length  Is  to  use  one  bit  to  distinguish  between 
T-tlckets  and  Stlckets,  However*  one  should  mention  that* 
If  predictions  are  not  too  Inaccurate*  9-tickets  are  to  be 
used  up  more  rapidly  than  T-tickets,  Then  an  equal  share  of 
the  ticket  number  space  may  not  be  the  best  solution, 

Ue  will  discuss  only  briefly  the  issue  of  fairness  in 
estimating  f,1(p).  We  consider  two  cases: 

-  senders  are  processors  (maxis*  minis*  micros) 
cooperating  within  a  distributed  computing 
system  to  be  viewed  as  a  unique  system  by  users. 
Algorithms  to  be  followed  by  senders  are 
designed  by  system  builders  who  are  responsible 
for  choosing  convenient  values  for  f,1(p), 

-  senders  are  computers  connected  on  a  computer 
network.  Over-estimation  Is  costly  to  senders 
because  (1)  processing  wasted  In  handling  NOP 
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messages  cannot  be  used  to  process  useful  mes¬ 
sages  (throughput  Is  lower)*  (11)  a  sender  Is 
billed  for  messages  carrying  NOP  code  and  for 
the  corresponding  processing  In  the  distant  com¬ 
puter. 


Because  of  the  "pipe-line"  nature  of  this  mechanism*  there 
will  be  no  Interruption  of  message  transmission.  What  this 
means  Is  that  receivers  may  be  kept  as  busy  as  desired.  If 
used  cleverly*  the  signalling  mechanism  using  anticipation 
can  achieve  any  desired  throughput. 


Hie t„1  m£ 

For  16  bit  tickets*  values  are  re-used  after  65  seconds  If 
ticketed  messages  are  Issued  every  millisecond  for  the  whole 
system*  after  IB  hours  and  12  minutes  If  ticketed  messages 
are  Issued  ever  second. 

For  32  bit  tickets*  lifetime  Is  much  longer.  Values  are  re¬ 
used  respectively  after  1  hour  and  12  minutes*  119  hours  or 
136  years  when  ticketed  messages  are  Issued  every 
microsecond*  100  microseconds  or  second  In  the  whole  system. 


7.8.5  Conclusion 

In  this  paper*  a  solution  to  the  problem  of  multiple-path 
slgnalllno  In  distributed  computing  systems  has  been 
described.  This  solution  Is  based  on  the  utilization  of  a 
particular  control  structure  which  can  achieve  a  distributed 
and  resilient  generation  of  sequential  Identifiers.  In  ad¬ 
dition  to  solving  the  mps  problem*  this  solution  can  be  used 
In  distributed  systems  which  should  be  resilient  and  where 
unique  names  need  to  be  generated  dynamically.  Also*  a 
side-effect  of  this  solution  is  to  allow  for  a  safe  detec¬ 
tion  of  duplicate  messages  at  a  high  level  In  the  system. 
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SECTION  8 

SUMMARY  ANO  FUTURE  DIRECTIONS 


s.i  fimtsAL  mzmimz  am  camclusiams 


The  Idea  of  a  process  has  not  been  fully  absorbed  by 
programming  languages  or  by  modern  hardware.  Consequently, 
the  concept  of  an  abstraction  of  a  process  and  Its  support 
Is  left  to  the  realm  of  operating  systems  (which  sit  between 
the  languaqe  and  the  hardware),  resulting  In  little  or  no 
standardl zt Ion  of  a  "process"  (especially  when  compared  to 
the  level  of  standardization  enjoyed  by  other  features  or 
aspects  of  higher  level  languages  and  hardware). 
Nevertheless,  as  this  report  has  Illustrated,  the  process 
concept  Is  becoming  central  to  the  design  of  computer 
systems  both  at  the  hardware  and  software  levels.  There  are 
many  reasons  for  this  development,  probably  the  two  most  Im¬ 
portant  ones  being;  (1)  the  decomposition  of  systems  and 
applications  problems  Into  sets  of  cooperating  parallel 
processes  for  greater  modularity.  functionality, 
flexibility,  and  maintainability*  and  (2)  the  Increasing 
cheapness  of  processors  and  memory  allowing  the  assignment 
of  processes  to  processors  In  an  economical  way. 

As  processes  have  become  "cheaper"  to  create,  maintain.  and 
destroy.  the  flexibility.  scope.  power.  and  economy  of 
Interprocess  communication  (IPC>  mechanisms  has  become  an 
Important  key  to  the  effectiveness  of  mu l t 1 -process  systems 
In  general,  and  highly  distributed  systems  In  particular. 
However,  there  currently  exists  a  wide  variety  of  mechanisms 
for  Interprocess  communication.  resulting  In  what  one 
researcher  [SALT  793  has  termed  the  "IPC  Jungle".  Perhaps 
the  major  reason  for  such  a  variety  comes  from  a  desire  to 
provide  In  one  set  of  primitives  all  of  the  followlna 
capab 1  titles: 

1)  Flexible  process  and/or  data  synchronization 
tools. 

2)  Data  transfer  mechanisms,  and 

3)  Communication  control  and  error  recovery 
mechanisms. 

Surprising  to  some  researchers  at  the  workshop  was  the  lack 
of  attention  paid  to  security,  fault  tolerance*  and  error 
recovery*  however,  this  may  be  taken  as  an  Indication  of  the 
general  state  of  affairs  of  a  young  technology.  In  such 
cases,  attention  Is  usually  first  focused  on  achieving  a 
certain  level  of  functionality  before  much  effort  Is  devoted 
to  engineering  those  features  that  make  the  technology 
robust  enough  to  be  put  Into  wide-spread  use. 
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Finally*  dissemination  of  Information  about  IPC  techniques 
and  options  with  respect  to  both  Implementation  and  per¬ 
formance  has  been  extremely  poor  In  the  past*  and  there  do 
not  appear  to  be  any  Immediate  advances  belnq  made  on  this 
aspect  of  the  problem. 
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8.2  Mfl&gsaae  sinmi 


Below  Is  a  summary  of  the  major  focus  areas  of  the  workshop 
and  their  conclusions* 


8.2.1  atidEfililoa*  filAliAfl*  ana  SaamiIIx 

Many  systems  have  Inadeauate  facilities  for  Identifying 
names  of  processes  within  the  same  host*  let  alone  for 
processes  residing  on  different  hosts.  Many  existing 
systems  almost  totally  sidestep  the  naming  Issue.  Part  of 
the  problem  stems  from  an  Inconsistent  view  of  the 
relationship  between  the  set  of  allowable  names  for  files* 
devices*  processes*  users*  mailboxes*  generic  system  ser¬ 
vices*  and  specific  system  services.  As  Llvesy  pointed  out 
during  the  workshop*  the  concept  of  the  size  of  the  naming 
universe  (of  which  the  system  forms  a  part)  Is  Implicit  In 
the  system  at  a  very  deep  level.  One  Is  forced  to  choose 
between  "add-on"nam1ng  techniques  such  as: 

/net/ A/resource 

which  are  not  vocation  Independent  on  the  one  hand*  and  a 
more  or  less  complete  redesign  of  the  naming  architecture  on 
the  other  hand.  UNIX  Is  an  example  of  a  system  that  makes 
assumptions  about  the  size  of  the  universe.  Until  this 
problem  Is  settled*  the  design  of  specific  Interprocess  com¬ 
munication  primitives  cannot  focus  on  the  set  of  fundamental 
objects  that  must  be  dealt  with.  This  Is  a  difficult  Issue* 
since  It  Is  here  that  many  of  the  system  security  Issues 
must  also  be  addressed. 


8.2.2 


SxnshcfloUAUgo 


Clearly*  a  major  function  of  Interprocess  communication  Is 
to  provide  either  explicit  or  Implicit  synchronization 
between  processes  and/or  access  to  shared  data.  Early  forms 
of  Interprocess  communication  depended  only  on  the  correct 
use  of  explicit  synchronization  primitives  for  sharing  ob¬ 
jects  (usually  sections  of  main  memory).  In  some  systems* 
temporary  files  served  as  s ync h r on  1 z 1 ng  polntes  between  job 
steps  (Implicit)*  while  In  other  systems*  processes  ex¬ 
plicitly  exchange  data  by  signaling.  Whether  synchroniza¬ 
tion  primitives  should  be  explicit  or  Implicit  Is  still  very 
much  an  open  question. 

It  Is  also  becoming  clear  to  some  of  the  researchers  In  the 
field  that  error  recovery  may  be  Integral  to  the  question  of 
synchronization*  Visibility  of  the  state  of  a  computational 
process  Is  at  the  heart  of  the  synchronization  and  error 
recovery  Issues.  Concern  over  the  "atomicity"  of  an  opera¬ 
tion  Is  becoming  more  of  a  focal  point  for  distributed 
systems  as  the  dimensions  of  time  and  space  for  com- 
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putatlonal  operations  begin  to  change  by  orders  of 
magnitude.  This  concern  Is  reflected  In  the  recent 
literature  concerning  synchronization  In  distributed  systems 
(see  the  1978-79  references)*  and  In  some  of  the  recent 
theoretical  work.  However*  their  effectiveness  using 
current  technology  Is  largely  unknown  until  prototype  Im¬ 
plementations  appear. 


8.2.3 


BechaQliM 


At  least  ten  currently  used  IPC  mechanisms  were  Identified 
alonq  with  some  estimate  of  their  support  of  certain 
qualities  deemed  desirable  by  the  workshop  attendees.  There 
was  more  agreement  on  the  set  of  desirable  qualities  than 
there  was  on  which  mechanisms  fulfilled  those  qualities.  It 
was  also  obvious  that  none  of  the  present  mechanisms  did 
everything  that  everybody  hoped  for*  which  should  tell  us 
that  we  have  yet  to  obtain  maturity  of  abstraction  (In  the 
sense  that  the  abstraction  of  a  subroutine  Is  well  under¬ 
stood)  for  a  general  IPC  mechanism.  For  these  reasons*  It 
seems  reasonable  to  keep  exDlorlng  new  mechanisms  while  we 
also  continue  to  build  real-world  systems  with  the  best 
techniques  we  have  heard  about. 


In  addition  It  appears  Important  to  devote  some  additional 
work  to  selecting  the  factors  to  be  utilized  In  assessing 
trade-offs  between  provability  versus  convenience  of  Im¬ 
plementation  and  use.  Many  of  the  mechanisms  discussed  at 
the  workshop  present  enormous  obstacles  to  rigorous  proof. 


8.2.4  Theoretical  Work 

Distributed  systems  present  new  theoretical  challenges  to 
researchers*  largely  because  the  specification  of  a 
distributed  computation  Involves  time  and  space  boundaries 
that  are  difficult  to  define*  and  may  be  constantly 
changlnq.  Variability  In  speeds  and  state  definition  may 
even  make  a  "system"  Inherently  non-determlnl st 1 c.  Such 
difficulties  throw  much  of  the  previous  work  In  proogram 
specification  and  correctness  Into  disarray  when  applied  to 
distributed  systems.  There  Is  little  agreement  whether  to 
approach  the  problem  using  "state-free"  or  "state-based" 
descriptions*  or  whether  to  grapple  with  atomic  or  non- 
atomlc  actions*  or  even  what  are  relevant  measures  of  "time" 
and  "space".  Once  again*  this  seems  to  reflect  the  Im¬ 
maturity  of  the  whole  field  of  distributed  systems. 
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8.3  CflMCLUSlflMS  m.  BEIBfl&ESCI 


Lastly*  we  should  be  honest  as  to  how  well  we  achieved  our 
original  goals*  Each  goal  Is  repeated  here  with  a  short 
comment  as  to  our  view  of  the  level  of  success  we  enjoyed 
and  the  reasons  for  It* 

1)  Assess  the  present  state-of-the-art  for  IPC 
mechanisms  In  distributed  data  processing 
systems* 

'  *  Successful*  A  reading  of  many  of  the 
enclosed  working  papers  and  the  references 
should  adequately  reflect  the  present  state- 
of-the-art  • 

2)  Identify  the  data  available  on  the  actual 
performance  of  various  IPC  policies  and 
mechan 1  sms • 

*  Unsuccessful*  An  attempt  was  made*  however 
lack  of  agreement  on  approorla^e  measures 
(see  mechanisms)  has  probably  prevented  any 
great  data  base  being  built  up* 

3)  Assess  the  potential  value  of  various  IPC 

mechanisms  In  satisfying  the  operational  and 
performance  requirements  for  highly 

distributed  systems* 

**  Moderately  successful*  Many  of  the  ad¬ 
vantages  and  disadvantages  of  the  functional 
aspects  of  current  mechanisms  In  use  were 
examined*  although*  obviously*  more  thorough 
operational  and  performance  assessments  must 
await  more  "distributed"  Implementations* 

A)  Identify  shortcomings  In  the  present  state- 
of-the-art  and  identify  promising  areas  for 
further  research  and  experiments  on  this  sub¬ 
ject. 

***  Successful*  A  reading  of  the  report 
reflects  many  of  the  shortcomings  of  current 
techniques.  Promising  areas  for  further 
research  were  not  specifically  addressed  In 
all  ^peasi  however*  they  are  Indirectly 
Identified  by  many  of  the  authors* 

5)  Identify  possible  standardization  levels  In 
IPC  design* 

*  Unsuccessful.  The  plethora  of  available 
abstractions  and  the  notable  lack  of  any 
single  outstanding  set  useful  for  distributed 
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applications  reflect  the  Immaturity  of  the 
field  and  possible  premature  standardization* 
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System  Overview*"  T ransactlons  on  Software 

lualnaarina’  vol.  SE-2»  no.  4*  December  1976*  pp. 
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ming  System*"  Cgmmunlgatlons  of  !££  A£M*  vol.  13* 
no.  4*  April  1970*  pp.  238-50. 
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[BURN  78]  J.  E.  Burns*  M.  J.  Fischer*  P.  Jackson*  N.  A. 

Lynch*  and  G.  L.  Peterson*  "Shared  Data 
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Introduction  to  Local  Area  Networks"*  Proceedings 
of.  the  vol.  66*  no.  11*  November  1978*  pp. 
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CDOUS  787  M.  Dowson*  "The  DEMOS  Multiple  Processor  Technical 
Summary*"  National  Physical  Laboratory  Technical 
Report*  NPL  Report  101*  April*  1978*  Teddlngton* 
Middlesex  TWII  OLW*  UK. 


Georgia  Institute  of  Technology 


I  PC  Workshop 


1 


Section  9  SELECTED  READINGS  AND  REFERENCES  Page  139 


CELLI 


CESWA 


C  F  ARB 


[GARC 


CGORD 


CGR  AH 


CGR  AP 


CHABE 


[HAMI 


C  HOAR 


[HOAR 


CHOLT 


[HOLT 


773  Clarence  A.  Ellist  "A  Robust  Algorithm  for  Updat¬ 
ing  Duplicate  Databases*"  Proceedings  of  the 
£££*£i£X  W£rk£h££  £n  £iit riaulai  Isla 
i!££liS£i!£I2i  £££  £212£yi££  Ne^wo rj<^*  May  25-27*  1977. 

763  K.  P.  Eswaran*  J.  N.  Gray*  R.  A.  Lorie*  and  I.  L. 
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August  1978,  pp,  666-677. 
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19787  ‘ 
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Z£*  Chicago*  Illinois*  November  1978*  pp. 
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